Albert Zeyer comments

Results 938 comments of


                                            Albert Zeyer

Dataset: implement global `dataset_distribution` option

> I think before merging this should get a dedicated test around the sharding. Well then this should be a draft for now?

Dataset: implement global `dataset_distribution` option

> @albertz Do you think this needs a test around the config processing? I'm not exactly sure what you mean by that. But e.g. there could be a test with...

Dataset: implement global `dataset_distribution` option

To clarify: this also fixes #1678?

Dataset: implement global `dataset_distribution` option

Sorry for introducing the small conflict, but my change should fix #1678 and #1737 already, and shouldn't really cause any issues to merge with the PR here.

Dataset: implement global `dataset_distribution` option

Btw, also see #1738. Not sure if this is relevant here.

Dataset: implement global `dataset_distribution` option

Can you summarize what this PR does? I will also try to write some summarizes here myself, but please edit your main description of the PR to cover that as...

Dataset: implement global `dataset_distribution` option

(Summarize) Added feature: when `torch.utils.data.DataLoader` is used with `num_workers>1`, this will set the sharding accordingly. (This is independent of the newly introduced global `dataset_distribution` option.) Btw, some questions regarding this:...

Dataset: implement global `dataset_distribution` option

The `_get_random_seed_for_epoch`, shouldn't it also consider `num_shards`/`shard_index`? Or only in the case of `dataset_distribution=="dataset_distribution"`? Or not because `random_seed_offset` already covers this part? (But I find it a bit inconsistent that...

Dataset: implement global `dataset_distribution` option

(Summary) New global config option `dataset_distribution`, which can be either set to `"random_seed_offset"` (default) or `"shard"`. This is for distributed training. "shard" will enable sharding for the dataset, so on...

`rf.RelPosCausalSelfAttention` fails with `single_step_dim`

I think it's just not implemented yet.