I’m currently a Senior Researcher at Microsoft DeepSpeed team, working on improving performance and efficiency of deep learning training and inference (deepspeed.ai, github repo). I have a mixed research background between systems and AI, and I worked in many areas including deep learning, similarity search, distributed caching systems, networks, and computer architecture. Regardless of the area, the way I do research is always similar: identify inefficiencies by in-depth analysis, and fix it by algorithm and policy designs. Particularly in the deep learning area, in the past few years at DeepSpeed team I worked on improving the communication efficiency via compression, improving computation efficiency via MoE modeling, and improving data efficiency via curriculum learning.

I received Ph.D. in Computer Science from Carnegie Mellon University in 2020, advised by Professor David G. Andersen. I received both B.S. (2013) and M.S. (2014) in Computer Science from Rice University, advised by Professor Alan L. Cox and supported by the Graduate Research Fellowship.

Publications

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
- Zhewei Yao, Xiaoxia Wu, Conglong Li, Minjia Zhang, Heyang Qin, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He.
- arXiv preprint arXiv:2309.14327. [tutorial][blog]
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
- Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He.
- arXiv preprint arXiv:2308.01320. [tutorial][blog]
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.
- Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He.
- arXiv preprint arXiv:2211.11586. [tutorial]
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
- Teven Le Scao et al. (391 authors. I contributed to code and infrastructure to train BLOOM on the Jean Zay supercomputer as a member of the Engineering team.)
- arXiv preprint arXiv:2211.05100.
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
- Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Connor Holmes, Cheng Li, Yuxiong He.
- In AAAI 2024. [tutorial][blog][arxiv]
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
- Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li et al.
- In NeurIPS 2023 AI for Science Workshop. [tutorial][blog][arxiv]
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam.
- Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He.
- In ICLR 2023. [tutorial][arxiv]
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB’s Convergence Speed.
- Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He.
- In HiPC 2022. [tutorial][arxiv]
- HiPC 2022 Best Paper Award (2 out of 34).
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models.
- Conglong Li, Minjia Zhang, Yuxiong He.
- In NeurIPS 2022. [tutorial][arxiv]
- (This paper was previously titled “Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training” in early arxiv preprint versions.)
XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient.
- Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He.
- In NeurIPS 2022. [tutorial][arxiv]
- NeurIPS 2022 Oral-Equivalent Paper (199 out of 2672).
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
- Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He.
- In NeurIPS 2022. [tutorial][arxiv]
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
- Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He.
- In ICML 2022. [tutorial][arxiv]
1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed.
- Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He.
- In ICML 2021. [tutorial][arxiv]
Learned Adaptive Accuracy-Cost Optimization for Machine Learning Systems.
- Conglong Li.
- Ph.D. Thesis.
Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.
- Conglong Li, Minjia Zhang, David G. Andersen, Yuxiong He.
- In ACM SIGMOD 2020. [source code]
Scaling Video Analytics on Constrained Edge Nodes.
- Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, Subramanya R. Dulloor.
- In SysML 2019. (This conference was renamed to MLSys from 2020.) [source code]
Better Caching in Search Advertising Systems with Rapid Refresh Predictions.
- Conglong Li, David G. Andersen, Qiang Fu, Sameh Elnikety, Yuxiong He.
- In WWW 2018.
Workload Analysis and Caching Strategies for Search Advertising Systems.
- Conglong Li, David G. Andersen, Qiang Fu, Sameh Elnikety, Yuxiong He.
- In ACM SoCC 2017.
Using Indirect Routing to Recover from Network Traffic Scheduling Estimation Error.
- Conglong Li, Matthew K. Mukerjee, David G. Andersen, Srinivasan Seshan, Michael Kaminsky, George Porter, Alex C. Snoeren.
- In ACM/IEEE ANCS 2017.
Scheduling Techniques for Hybrid Circuit/Packet Networks.
- He Liu, Matthew K. Mukerjee, Conglong Li, Nicolas Feltman, George Papen, Stefan Savage, Srinivasan Seshan, Geoffrey M. Voelker, David G. Andersen, Michael Kaminsky, George Porter, Alex C. Snoeren.
- In ACM CoNEXT 2015.
GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores.
- Conglong Li, Alan L. Cox.
- In ACM EuroSys 2015.
Reducing DRAM Row Activations with Eager Read/Write Clustering.
- Myeongjae Jeon, Conglong Li, Alan L. Cox, Scott Rixner.
- In ACM TACO 2013.
GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores.
- Conglong Li, Alan L. Cox.
- In 7th Workshop on Large-Scale Distributed Systems and Middleware (LADIS 2013).

Last updated: 2024/02/26

Conglong Li

Publications