pytorch-distributed
pytorch-distributed copied to clipboard
A quickstart and benchmark for pytorch distributed training.
Hi there, Great repo! I'm studying this topic, and found out that the [official repo](https://github.com/pytorch/examples/blob/master/imagenet/main.py) of imagenet classification also uses multiprocessing. I noticed one place that they not only use...
When using nccl as my communication backend in distributed learning, I found that all operations about gathering variables from other groups can't work. The program would be stopped because of...
When I used multiprocessing distributed, I encountered an error: Can't pickle : attribute lookup main_worker on __main__ failed. I found this error even if I did not make any changes...
Bumps [torch](https://github.com/pytorch/pytorch) from 1.3.0 to 2.2.0. Release notes Sourced from torch's releases. PyTorch 2.2: FlashAttention-v2, AOTInductor PyTorch 2.2 Release Notes Highlights Backwards Incompatible Changes Deprecations New Features Improvements Bug fixes...