pipeline-parallelism topic
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
ColossalAI
Making large AI models cheaper, faster and more accessible
PaddleFleetX
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
torchgpipe
A GPipe implementation in PyTorch
DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
pytorch-gpt-x
Implementation of autoregressive language model using improved Transformer and DeepSpeed pipeline parallelism.
FTPipe
FTPipe and related pipeline model parallelism research.
awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference