streaming icon indicating copy to clipboard operation
streaming copied to clipboard

Replication changes sample order

Open CodeCreator opened this issue 7 months ago • 3 comments

Environment

  • mosaicml-streaming==0.7.5

To reproduce

Steps to reproduce the behavior:

  1. Use StreamingDataset in distributed training with the same seed and set replication either to None or an integer > 1
  2. Print out samples across all devices and ignore duplicated samples

Expected behavior

The overall order of the samples should be the same, but using replication seems to lead to a different random shuffling of the data

CodeCreator avatar Jul 15 '24 16:07 CodeCreator