accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

Add early support for `torchdata.stateful_dataloader.StatefulDataLoader` within the `Accelerator`

Open byi8220 opened this issue 8 months ago • 1 comments

What does this PR do?

Fixes https://github.com/huggingface/accelerate/issues/2859

This PR does the following:

  1. Added a new field use_stateful_dataloader in DataLoaderConfiguration. Passing this into the config makes it so that all DataLoaders prepared and returned by the Accelerator are StatefulDataLoader objects from the torchdata library
  2. Create a class DataLoaderAdapter which can wrap around and act as either PyTorch's DataLoader, or other variants of it such as StatefulDataLoader
  3. Refactor DataLoaderShard, DataLoaderDispatcher, and SkipDataLoader to inherit from DataLoaderAdapter instead of DataLoader

Testing

Added new unit tests to test that StatefulDataLoader can be dropped in and loaded and saved from.

Caveats

  • The torchdata package may have conflicts with accelerate, see https://github.com/huggingface/accelerate/issues/2894
    • However, if torchdata is not installed, all existing tests pass, suggesting this is not regressive.
  • torchdata.stateful_dataloader.StatefulDataLoader is only available in the beta build of torchdata, this is not a stable feature.
  • Adding another dependency (on a nightly package) means that almost none of the tests added in this PR is done underneath the existing images or imports.
  • This has only been tested on my local workstation using a single GPU.
  • The implementation of DataLoaderAdapter is somewhat invasive and uses some questionable reflection

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [X] Did you read the contributor guideline, Pull Request section?
  • [X] Was this discussed/approved via a Github issue (see above)
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [X] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@muellerzr

byi8220 avatar Jun 26 '24 22:06 byi8220