data
data copied to clipboard
Unavailable Distributed Store for MPI backend
🐛 Describe the bug
We previously land a change to make DataLoader to communicate across distributed processes to share the random seed per epoch. However, I just found distributed store is not available when mpi backend is used.
https://github.com/pytorch/pytorch/blob/706b99030656c573619cebaa3be9298a575fc776/torch/utils/data/dataloader.py#L574
To fix that, we should convert the distributed store back to a process group via dist.new_group. I will do the same change when I implement DistributedReadingService.
Versions
PyTorch mater TorchData main