pytorch-distributed icon indicating copy to clipboard operation
pytorch-distributed copied to clipboard

A quickstart and benchmark for pytorch distributed training.

Results 14 pytorch-distributed issues
Sort by recently updated
recently updated
newest added

Hi there, Great repo! I'm studying this topic, and found out that the [official repo](https://github.com/pytorch/examples/blob/master/imagenet/main.py) of imagenet classification also uses multiprocessing. I noticed one place that they not only use...

enhancement

When using nccl as my communication backend in distributed learning, I found that all operations about gathering variables from other groups can't work. The program would be stopped because of...

When I used multiprocessing distributed, I encountered an error: Can't pickle : attribute lookup main_worker on __main__ failed. I found this error even if I did not make any changes...

Bumps [torch](https://github.com/pytorch/pytorch) from 1.3.0 to 2.2.0. Release notes Sourced from torch's releases. PyTorch 2.2: FlashAttention-v2, AOTInductor PyTorch 2.2 Release Notes Highlights Backwards Incompatible Changes Deprecations New Features Improvements Bug fixes...

dependencies