Sebastian Hoffmann comments

Results 49 comments of


                                            Sebastian Hoffmann

apply_sharding() check does not care about sharding priorities

Notice that the above example is just for demonstration purposes. In a real pipeline these two sharding operations might take place in vastly different places. So replacing them with one...

apply_sharding() check does not care about sharding priorities

@ejguan set_graph_random_seed does not account for different sharding priorities as well (https://github.com/pytorch/data/blob/main/torchdata/dataloader2/graph/settings.py#L31)

apply_sharding() check does not care about sharding priorities

I find the intended behavior a bit problematic anyways: The usual principle wrt multiprocessing right now is that every worker executes the same pipeline. If a sharding filter is encountered;...

apply_sharding() check does not care about sharding priorities

It is also not clear to me right now how such shuffle operations are supposed to behave if one wants to set a fixed seed via `Dataloader2.seed()`.

apply_sharding() check does not care about sharding priorities

@ejguan I believe this has been fixed by https://github.com/pytorch/pytorch/pull/97287. Is that correct?

apply_sharding() check does not care about sharding priorities

No, sorry, I'm afraid not. A fix could look like this: https://github.com/ejguan/pytorch/blob/f2cea87c1f9741e78c60c456bb0cd0f22d0689f7/torch/utils/data/graph_settings.py#L65 ``` if len(sig.parameters) < 3: sharded = dp.apply_sharding(num_of_instances, instance_id) else: sharded = dp.apply_sharding(num_of_instances, instance_id, sharding_group=sharding_group) if sharded: applied...

Sebastian Hoffmann

apply_sharding() check does not care about sharding priorities

apply_sharding() check does not care about sharding priorities

apply_sharding() check does not care about sharding priorities

apply_sharding() check does not care about sharding priorities

apply_sharding() check does not care about sharding priorities

apply_sharding() check does not care about sharding priorities

Mux with MPRS causes operations after sharding_round_robin_dispatcher to run on the same worker

Future of torchdata and dataloading

[RFC] Polylithic: Enabling multi-threaded DataLoading through non-monolithic parallelism