transformers
transformers copied to clipboard
Update stateful_callbacks state before saving checkpoint
Fixes
This PR addresses an issue where stateful callbacks, such as EarlyStoppingCallback, were not being updated before saving checkpoints. As a result, resuming training would not have access to the latest state of these callbacks.
Description
Updated the state of stateful callbacks before saving checkpoints to ensure that their state is preserved and correctly restored when resuming training.
Suggested Reviewers
@muellerz @amyeroberts @SunMarc
Gentle ping @muellerzr
@pedrobrs can you rebase from main? This should fix the failing tests
looks like this logic broke some of the tests in test_trainer.py:
tests/trainer/test_trainer_callback.py::TrainerCallbackTest::test_stateful_duplicate_callbacks
Looks like this is because this change doesn't actually wind up restoring the state?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
I saw that error when pushing the first time, I'll investigate the issue to identify and fix the problem. There is at least one other test failed t(marked as required) that I don't fully understand. The error occurs in step_and_quality / check_repository_consistency. The error message is:
RuntimeError: Failed to import transformers.models.audio_spectrogram_transformer.feature_extraction_audio_spectrogram_transformer because of the following error (look up to see its traceback): libtorch_cuda.so: cannot open shared object file: No such file or directory
I may not be able to resolve this issue on my own. What should I do about it?
This one is unrelated to you and was already fixed on main!
cc @ArthurZucker for final 🚀