data icon indicating copy to clipboard operation
data copied to clipboard

[feature request] Upstream to core PyTorch `StatefulDataLoader` and `StatefulDistributedSampler`

Open vadimkantorov opened this issue 3 months ago • 2 comments

🚀 The feature

Being able to precisely recover the state of data loading is a popular feature. Would be great to have it in core to increase visibility of its existence :)

Motivation, pitch

N/A

Alternatives

No response

Additional context

No response

vadimkantorov avatar Sep 17 '25 18:09 vadimkantorov

Hi @vadimkantorov We are currently working on upstreaming it

https://github.com/ramanishsingh/pytorch/tree/upstream_sdl

ramanishsingh avatar Sep 17 '25 18:09 ramanishsingh

It would also be great to add a DistributedSamplerWrapper (to wrap existing RandomSampler and SequentialSampler or any other custom Sampler - like in https://github.com/catalyst-team/catalyst/blob/master/catalyst/data/sampler.py#L499). Currently for some reason DistributedSampler mixes both together...

And another old thing is to have some native utils for handling arrays of strings as tensors (friendly to DataLoader):

  • https://github.com/pytorch/pytorch/issues/101699

vadimkantorov avatar Sep 17 '25 20:09 vadimkantorov