- InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers
Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yinmin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen, Yanbo Yu, Yichen Shen, Yibo Zhu, Daxin Jiang SIGCOMM'25. To appear.
|
- Attacking Heterogeneity with Disaggregated Training for Multimodal Large Language Models
Zili Zhang, Yinmin Zhong, Ranchen Ming, Hanpeng Hu, Jianjian Sun, Zheng Ge, Yibo Zhu, Xin Jin SIGCOMM'25. To appear.
|
- Optimizing RLHF Training for Large Language Models with Stage Fusion
Yinmin Zhong, Zili Zhang, Bingyang Wu, Shengyu Liu, Yukun Chen, Changyi Wan, Hanpeng Hu, Lei Xia, Ranchen Ming, Yibo Zhu, Xin Jin NSDI'25. [PDF]
|
- DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao Zhang OSDI'24. [PDF]
|
- Accelerating Distributed MoE Training and Inference with Lina
Jiamin Li, Yimin Jiang, Yibo Zhu, Cong Wang, Hong Xu ATC'23. [PDF]
|
- ByteTransformer: A high-performance transformer boosted for variable-length inputs
Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu IPDPS'23. [PDF] Best paper
|
- Lyra: Elastic scheduling for deep learning clusters
Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang EuroSys'23. [PDF]
|
- Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies
Zhuang Wang, Haibin Lin, Yibo Zhu, T.S. Eugene Ng EuroSys'23. [PDF]
|
- BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo NSDI'23. [PDF]
|
- SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training
Yangrui Chen, Cong Xie, Meng Ma, Juncheng Gu, Yanghua Peng, Haibin Lin, Chuan Wu, Yibo Zhu NeuIPS'22. [PDF]
|
- Multi-Resource Interleaving for Deep Learning Training
Yihao Zhao, Yuanqiang Liu, Yanghua Peng, Yibo Zhu, Xuanzhe Liu, Xin Jin SIGCOMM'22. [PDF]
|
- dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training
Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo MLSys'22. [PDF]
|
- Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu MLSys'22. [PDF]
|
- AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, Arvind Krishnamurthy ICLR'21. [PDF]
|
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
Zhihao Bai, Zhen Zhang, Yibo Zhu, Xin Jin OSDI'20. [PDF]
|
- A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, Chuanxiong Guo OSDI'20. [PDF]
|
- Elastic Parameter Server Load Distribution in Deep Learning Clusters
Yangrui Chen, Yanghua Peng, Yixin Bao, Chuan Wu, Yibo Zhu, Chuanxiong Guo SoCC'20. [PDF]
|
-
A Generic Communication Scheduler for Distributed DNN Training Acceleration
Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, Chuanxiong Guo
SOSP'19. [PDF]
|
-
Tiresias: A GPU Cluster Manager for Distributed Deep Learning
Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, Chuanxiong Guo
NSDI'19. [PDF]
|