distributed-training topic

List distributed-training repositories

determined

2.9k
Stars
346
Forks
Watchers

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.

deeplearning-cfn

256
Stars
115
Forks
Watchers

Distributed Deep Learning on AWS Using CloudFormation (CFN), MXNet and TensorFlow

terngrad

180
Stars
48
Forks
Watchers

Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)

torchx

304
Stars
96
Forks
Watchers

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.

adanet

3.5k
Stars
531
Forks
Watchers

Fast and flexible AutoML with learning guarantees.

PLSC

149
Stars
34
Forks
Watchers

Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, Swin, DeiT, CaiT, FaceViT, MoCo, MAE, ConvMAE, CAE.

YOLO3D-YOLOv4-PyTorch

286
Stars
44
Forks
Watchers

YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud (ECCV 2018)

alpa

3.0k
Stars
344
Forks
Watchers

Training and serving large-scale neural networks with auto parallelization.

adaptdl

406
Stars
74
Forks
Watchers

Resource-adaptive cluster scheduler for deep learning training.

KungFu

289
Stars
58
Forks
Watchers

Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.