Yibo Zhu

I am working on a stealth startup after I briefly worked at Google. Before Google, I was a Director at ByteDance, responsible for the company's AI infrastracture. I started my career as a Researcher at Microsoft Research.

My interests include distributed systems and networks. I enjoy building large-scale software systems with emerging hardware like GPU, RDMA NICs and programmable ASICs.

I obtained my Ph.D. from the Department of Computer Science at UCSB, co-advised by Prof. Ben Y. Zhao and Prof. Heather Zheng. I received my bachelor degree from Tsinghua University.

I am a recipient of Microsoft Research Ph.D. Fellowship (2015).

Selected academic service: TPC for SIGCOMM('18, '20, '21, '23), ATC('22, '23, '24), CoNEXT'18. TPC Chair for KBNets'18. General Chair for HotNets'18.


AI + System
  • DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
    Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao Zhang
    OSDI'24. To appear.
  • Accelerating Distributed MoE Training and Inference with Lina
    Jiamin Li, Yimin Jiang, Yibo Zhu, Cong Wang, Hong Xu
    ATC'23. [PDF]
  • ByteTransformer: A high-performance transformer boosted for variable-length inputs
    Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu
    IPDPS'23. [PDF]
    Best paper
  • Lyra: Elastic scheduling for deep learning clusters
    Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang
    EuroSys'23. [PDF]
  • Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies
    Zhuang Wang, Haibin Lin, Yibo Zhu, T.S. Eugene Ng
    EuroSys'23. [PDF]
  • BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
    Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo
    NSDI'23. [PDF]
  • SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training
    Yangrui Chen, Cong Xie, Meng Ma, Juncheng Gu, Yanghua Peng, Haibin Lin, Chuan Wu, Yibo Zhu
    NeuIPS'22. [PDF]
  • Multi-Resource Interleaving for Deep Learning Training
    Yihao Zhao, Yuanqiang Liu, Yanghua Peng, Yibo Zhu, Xuanzhe Liu, Xin Jin
    SIGCOMM'22. [PDF]
  • dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training
    Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo
    MLSys'22. [PDF]
  • Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
    Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu
    MLSys'22. [PDF]
  • AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
    Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, Arvind Krishnamurthy
    ICLR'21. [PDF]
  • PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
    Zhihao Bai, Zhen Zhang, Yibo Zhu, Xin Jin
    OSDI'20. [PDF]
  • A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
    Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, Chuanxiong Guo
    OSDI'20. [PDF]
  • Elastic Parameter Server Load Distribution in Deep Learning Clusters
    Yangrui Chen, Yanghua Peng, Yixin Bao, Chuan Wu, Yibo Zhu, Chuanxiong Guo
    SoCC'20. [PDF]
  • A Generic Communication Scheduler for Distributed DNN Training Acceleration
    Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, Chuanxiong Guo
    SOSP'19. [PDF]
  • Tiresias: A GPU Cluster Manager for Distributed Deep Learning
    Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, Chuanxiong Guo
    NSDI'19. [PDF]
RDMA Networks
  • Collie: Finding Performance Anomalies in RDMA Subsystems
    Xinhao Kong, Yibo Zhu, Huaping Zhou, Zhuo Jiang, Jianxi Ye, Chuanxiong Guo, Danyang Zhuo
    NSDI'22. [PDF]
  • TEA: Enabling State-Intensive Network Functions on Programmable Switches
    Daehyeok Kim, Zaoxing Liu, Yibo Zhu, Changhoon Kim, Jeongkeun Lee, Vyas Sekar, Srinivasan Seshan
    SIGCOMM'20. [PDF]
  • FreeFlow: Software-based RDMA Virtual Networking for Containerized Clouds
    Daehyeok Kim, Tianlong Yu, Hongqiang Harry Liu, Yibo Zhu, Jitu Padhye, Shachar Raindel, Chuanxiong Guo, Vyas Sekar, Srinivasan Seshan
    NSDI'19. [PDF]
  • Generic External Memory for Switch Data Planes
    Daehyeok Kim, Yibo Zhu, Changhoon Kim, Jeongkeun Lee, Srinivasan Seshan
    HotNets'18. [PDF]
  • HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions in Multi-Tenant Storage Systems
    Daehyeok Kim, Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu , Jitendra Padhye, Shachar Raindel, Steven Swanson, Vyas Sekar, Srinivasan Seshan
    SIGCOMM'18. [PDF]
  • Tagger: Practical PFC Deadlock Prevention in Data Center Networks
    Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, Kai Chen
    CoNEXT'17. [PDF]
  • ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY
    Yibo Zhu, Monia Ghobadi, Vishal Misra, Jitendra Padhye
    CoNEXT'16. [PDF]
    Best Paper Runner-up
  • Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them
    Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, Kai Chen
    HotNets'16. [PDF]
  • Congestion Control for Large-scale RDMA Deployments
    Yibo Zhu, Daniel Firestone, Chuanxiong Guo, Jitendra Padhye, Shachar Raindel, Ming Zhang, Yehonatan Liron, Haggai Eran, Mohamad Haj Yahia and Marina Lipshteyn.
    SIGCOMM'15. [PDF]
Network Reliability
  • dShark: A General, Easy to Program and Scalable Framework for Analyzing In-network Packet Traces
    Da Yu, Yibo Zhu, Behnaz Arzani, Rodrigo Fonseca, Tianrong Zhang, Lihua Yuan, Karl Deng
    NSDI'19. [PDF]
  • 007: Democratically Finding The Cause of Packet Drops
    Behnaz Arzani, Selim Ciraci, Luiz Chamon, Yibo Zhu, Hongqiang Harry Liu, Jitendra Padhye, Geoff Outhred, Boon Thau Loo
    NSDI'18. [PDF]
  • CrystalNet: Faithfully Emulating Large Production Networks
    Hongqiang Harry Liu*, Yibo Zhu*, Jitu Padhye, Jiaxin Cao, Sri Tallapragada, Nuno P. Lopes, Andrey Rybalchenko, Guohan Lu, Lihua Yuan
    SOSP'17. [PDF]
    *Co-primary authors
  • Packet-Level Telemetry in Large Datacenter Networks
    Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Haitao Zheng and Ben Zhao.
    SIGCOMM'15. [PDF]
  • Cutting the Cord: A Robust Wireless Facilities Network for Data Centers
    Yibo Zhu, Xia Zhou, Zengbin Zhang, Lin Zhou, Amin Vahdat, Ben Y. Zhao and Haitao Zheng.
    MobiCom'14. [PDF]
Network Stack in Software
  • Towards Timeout-less Transport in Commodity Datacenter Networks
    Hwijoon Lim, Wei Bai, Yibo Zhu, Youngmok Jung, Dongsu Han
    EuroSys'21. [PDF]
  • Congestion Control for Cross-Datacenter Networks
    Gaoxiong Zeng, Wei Bai, Ge Chen, Kai Chen, Dongsu Han, Yibo Zhu, Lei Cui
    ICNP'19. [PDF]
  • Slim: OS Kernel Support for a Low-Overhead Container Overlay Network
    Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu, Matthew Rockett, Arvind Krishnamurthy, Thomas Anderson
    NSDI'19. [PDF]
  • Combining ECN and RTT for Datacenter Transport
    Gaoxiong Zeng, Wei Bai, Ge Chen, Kai Chen, Dongsu Han, Yibo Zhu
    APNet'17. [PDF]
  • Trimming the Smartphone Network Stack
    Yanzi Zhu, Yibo Zhu, Ana Nika, Ben Y. Zhao, Haitao Zheng
    HotNets'16. [PDF]
  • Datacast: A Scalable and Efficient Reliable Group Data Delivery Service for Data Centers
    Jiaxin Cao, Chuanxiong Guo, Guohan Lu, Yongqiang Xiong, Yixin Zheng, Yongguang Zhang, Yibo Zhu, Chen Chen, and Ye Tian.
    JSAC, 31(12):2632-2645, 2013. [PDF]
  • Datacast: A Scalable and Efficient Reliable Group Data Delivery Service for Data Centers
    Jiaxin Cao, Chuanxiong Guo, Guohan Lu, Yongqiang Xiong, Yixin Zheng, Yongguang Zhang, Yibo Zhu, Chen Chen.
    CoNEXT'12. [PDF]
Wireless and Mobile
  • Empirical Validation of Commodity Spectrum Monitoring
    Ana Nika, Zhijing Li, Yanzi Zhu, Yibo Zhu, Ben Y. Zhao, Xia Zhou and Haitao Zheng.
    SenSys'16. [PDF]
  • Reusing 60GHz Radios for Mobile Radar Imaging
    Yanzi Zhu, Yibo Zhu, Ben Y. Zhao and Haitao Zheng.
    MobiCom'15. [PDF]
  • Energy and Performance of Smartphone Radio Bundling in Outdoor Environments
    Ana Nika, Yibo Zhu, Ning Ding, Abhilash Jindal, Y. Charlie Hu, Xia Zhou, Ben Zhao and Haitao Zheng.
    WWW'15. [PDF]
  • 60GHz Mobile Imaging Radar
    Yibo Zhu, Yanzi Zhu, Zengbin Zhang, Ben Y. Zhao and Haitao Zheng.
    HotMobile'15. [PDF]
  • Demystifying 60GHz Outdoor Picocells
    Yibo Zhu, Zengbin Zhang, Zhinus Marzi, Chris Nelson, Upamanyu Madhow, Ben Y. Zhao and Haitao Zheng.
    MobiCom'14. [PDF]
  • Mirror Mirror on the Ceiling: Flexible Wireless Links for Data Centers
    Xia Zhou, Zengbin Zhang, Yibo Zhu, Yubo Li, Saipriya Kumar, Amin Vahdat, Haitao Zheng and Ben Y. Zhao.
    SIGCOMM'12. [PDF]
Online Social Networks and P2P
  • Serf and Turf: Crowdturfing for Fun and Profit
    Gang Wang, Christo Wilson, Xiaohan Zhao, Yibo Zhu, Manish Mohanlal, Haitao Zheng and Ben Y. Zhao.
    WWW'12. [PDF]
  • Towards an Accurate Network Coordinate System by Handling Major Portion of TIVs
    Zhuo Chen, Yang Chen, Yibo Zhu, Cong Ding, Beixing Deng, Xing Li.
    GLOBECOM'11. [PDF]
  • Taming the Triangle Inequality Violations with Network Coordinate System on Real Internet
    Yibo Zhu, Yang Chen, Zengbin Zhang, Xiaoming Fu, Dan Li, Beixing Deng, Xing Li.
    ReArch'10. [PDF]