distributed-training topic
determined
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
deeplearning-cfn
Distributed Deep Learning on AWS Using CloudFormation (CFN), MXNet and TensorFlow
terngrad
Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)
torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
adanet
Fast and flexible AutoML with learning guarantees.
PLSC
Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, Swin, DeiT, CaiT, FaceViT, MoCo, MAE, ConvMAE, CAE.
YOLO3D-YOLOv4-PyTorch
YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud (ECCV 2018)
alpa
Training and serving large-scale neural networks with auto parallelization.
adaptdl
Resource-adaptive cluster scheduler for deep learning training.
KungFu
Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.