transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Support AcceleratorConfig.use_stateful_dataloader in Trainer

Open byi8220 opened this issue 4 months ago • 0 comments

What does this PR do?

This PR does the following:

  1. Add a new field to TrainerArguments.AcceleratorConfig, use_stateful_dataloader which when set to true passes through use_stateful_dataloader to the DataLoaderConfiguration used to initialize the Trainer's backing Accelerator.
  2. Add a new field train_dataloader_state_dict to TrainerState, which is used to persist the StatefulDataLoader's state_dict at the time of checkpointing.
  3. When resuming from checkpoint for a StatefulDataLoader backed Trainer, instead of the training dataloader skipping batches to resume within an epoch, have the train dataloader load the state_dict derived from the loaded train_dataloader_state_dict.

This PR was tested through the following:

  1. Unit test TrainerIntegrationTest.test_train_and_eval_dataloaders_with_use_stateful_dataloader, which is a sanity test on the dataloaders used.
  2. Unit test TrainerIntegrationTest.test_resume_training_with_stateful_dataloaders, which mirrors the other test_resume_training.* tests, asserting that resuming from checkpoint behaves sensibly, and that the saved checkpoints do contain saved state_dicts.

Minor changes: this PR adds dependencies on accelerate>=1.0.0 and torchdata>=0.8.0 to transformers and a minor issue with Trainer test cases

Fixes #31441

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [X] Did you read the contributor guideline, Pull Request section?
  • [X] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [X] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [X] Did you write any new necessary tests?

Who can review?

@muellerzr and @SunMarc

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

byi8220 avatar Oct 16 '24 23:10 byi8220