byteps
byteps copied to clipboard
Data Parallel Group as in Pytorch Distributed
Pytorch distributed has a feature that allows users to define data parallel group https://pytorch.org/docs/stable/distributed.html. This feature is very useful when we use the model parallelism. And, we could possibly use byteps push_and_pull with the gradient compression feature Yuchen contributed for the gradient communication in deepspeed https://github.com/microsoft/DeepSpeed.