model-parallelism topic
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
ColossalAI
Making large AI models cheaper, faster and more accessible
PaddleFleetX
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
torchgpipe
A GPipe implementation in PyTorch
veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*