nccl topic
bluefog
Distributed and decentralized training framework for PyTorch over graph
nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
large_language_model_training_playbook
An open collection of implementation tips, tricks and resources for training large language models
llm_training_handbook
An open collection of methodologies to help with successful training of large language models.
msrflute
Federated Learning Utilities and Tools for Experimentation
pyDNMFk
Python Distributed Non Negative Matrix Factorization with custom clustering
NCCL.jl
A Julia wrapper for the NVIDIA Collective Communications Library.
NCCL
Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.