awesome-distributed-ml icon indicating copy to clipboard operation
awesome-distributed-ml copied to clipboard

A curated list of awesome projects and papers for distributed training or inference

Awesome Distributed Machine Learning System

Awesome PRs Welcome

A curated list of awesome projects and papers for distributed training or inference especially for large model.

Contents

  • Awesome Distributed Machine Learning System
    • Contents
    • Open Source Projects
    • Papers
      • Survey
      • Pipeline Parallelism
      • Sequence Parallelism
      • Mixture-of-Experts System
      • Graph Neural Networks System
      • Hybrid Parallelism & Framework
      • Memory Efficient Training
      • Tensor Movement
      • Auto Parallelization
      • Communication Optimization
      • Fault-tolerant Training
      • Inference and Serving
      • Applications
    • Contribute

Open Source Projects

Papers

Survey

Pipeline Parallelism

Sequence Parallelism

Mixture-of-Experts System

Graph Neural Networks System

Hybrid Parallelism & Framework

Memory Efficient Training

Tensor Movement

Auto Parallelization

Communication Optimization

Fault-tolerant Training

Inference and Serving

Applications

Contribute

All contributions to this repository are welcome. Open an issue or send a pull request.