Albert Zeyer
Albert Zeyer
> I think before merging this should get a dedicated test around the sharding. Well then this should be a draft for now?
> @albertz Do you think this needs a test around the config processing? I'm not exactly sure what you mean by that. But e.g. there could be a test with...
To clarify: this also fixes #1678?
Sorry for introducing the small conflict, but my change should fix #1678 and #1737 already, and shouldn't really cause any issues to merge with the PR here.
Btw, also see #1738. Not sure if this is relevant here.
Can you summarize what this PR does? I will also try to write some summarizes here myself, but please edit your main description of the PR to cover that as...
(Summarize) Added feature: when `torch.utils.data.DataLoader` is used with `num_workers>1`, this will set the sharding accordingly. (This is independent of the newly introduced global `dataset_distribution` option.) Btw, some questions regarding this:...
The `_get_random_seed_for_epoch`, shouldn't it also consider `num_shards`/`shard_index`? Or only in the case of `dataset_distribution=="dataset_distribution"`? Or not because `random_seed_offset` already covers this part? (But I find it a bit inconsistent that...
(Summary) New global config option `dataset_distribution`, which can be either set to `"random_seed_offset"` (default) or `"shard"`. This is for distributed training. "shard" will enable sharding for the dataset, so on...
I think it's just not implemented yet.