I’m currently a Senior Researcher at Microsoft DeepSpeed team, working on improving performance and efficiency of deep learning training and inference (deepspeed.ai, github repo). In general, I work on improving performance and resource efficiency of all kinds of computer systems via experimental research, data analysis, and algorithm/policy optimizations. My broad research interests lead to experience and publications in many areas including deep learning, similarity search, distributed caching systems, networks, and computer architecture.

I received Ph.D. in Computer Science from Carnegie Mellon University in 2020, advised by Professor David G. Andersen. I received both B.S. (2013) and M.S. (2014) in Computer Science from Rice University, advised by Professor Alan L. Cox and supported by the Graduate Research Fellowship.


  1. Extreme Compression for Pre-trained Transformers Made Simple and Efficient.
  2. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
  3. Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam.
  4. DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
    • Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He.
    • In ICML 2022. [tutorial][arxiv]
  5. Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training.
  6. 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB’s Convergence Speed.
  7. 1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed.
    • Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He.
    • In ICML 2021. [tutorial][arxiv]
  8. Learned Adaptive Accuracy-Cost Optimization for Machine Learning Systems.
  9. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.
  10. Scaling Video Analytics on Constrained Edge Nodes.
    • Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, Subramanya R. Dulloor.
    • In SysML 2019. (This conference was renamed to MLSys from 2020.) [source code]
  11. Better Caching in Search Advertising Systems with Rapid Refresh Predictions.
    • Conglong Li, David G. Andersen, Qiang Fu, Sameh Elnikety, Yuxiong He.
    • In WWW 2018.
  12. Workload Analysis and Caching Strategies for Search Advertising Systems.
    • Conglong Li, David G. Andersen, Qiang Fu, Sameh Elnikety, Yuxiong He.
    • In ACM SoCC 2017.
  13. Using Indirect Routing to Recover from Network Traffic Scheduling Estimation Error.
    • Conglong Li, Matthew K. Mukerjee, David G. Andersen, Srinivasan Seshan, Michael Kaminsky, George Porter, Alex C. Snoeren.
    • In ACM/IEEE ANCS 2017.
  14. Scheduling Techniques for Hybrid Circuit/Packet Networks.
    • He Liu, Matthew K. Mukerjee, Conglong Li, Nicolas Feltman, George Papen, Stefan Savage, Srinivasan Seshan, Geoffrey M. Voelker, David G. Andersen, Michael Kaminsky, George Porter, Alex C. Snoeren.
    • In ACM CoNEXT 2015.
  15. GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores.
  16. Reducing DRAM Row Activations with Eager Read/Write Clustering.
    • Myeongjae Jeon, Conglong Li, Alan L. Cox, Scott Rixner.
    • In ACM TACO 2013.
  17. GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores.
    • Conglong Li, Alan L. Cox.
    • In 7th Workshop on Large-Scale Distributed Systems and Middleware (LADIS 2013).

Last updated: 2022/07/23