streaming
streaming copied to clipboard
Replication changes sample order
Environment
- mosaicml-streaming==0.7.5
To reproduce
Steps to reproduce the behavior:
- Use
StreamingDataset
in distributed training with the same seed and setreplication
either to None or an integer > 1 - Print out samples across all devices and ignore duplicated samples
Expected behavior
The overall order of the samples should be the same, but using replication
seems to lead to a different random shuffling of the data