data icon indicating copy to clipboard operation
data copied to clipboard

Unavailable Distributed Store for MPI backend

Open ejguan opened this issue 3 years ago • 0 comments

🐛 Describe the bug

We previously land a change to make DataLoader to communicate across distributed processes to share the random seed per epoch. However, I just found distributed store is not available when mpi backend is used.

https://github.com/pytorch/pytorch/blob/706b99030656c573619cebaa3be9298a575fc776/torch/utils/data/dataloader.py#L574

To fix that, we should convert the distributed store back to a process group via dist.new_group. I will do the same change when I implement DistributedReadingService.

Versions

PyTorch mater TorchData main

ejguan avatar Sep 14 '22 14:09 ejguan