transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Update stateful_callbacks state before saving checkpoint

Open brs-pt opened this issue 1 year ago • 6 comments
trafficstars

Fixes

This PR addresses an issue where stateful callbacks, such as EarlyStoppingCallback, were not being updated before saving checkpoints. As a result, resuming training would not have access to the latest state of these callbacks.

Description

Updated the state of stateful callbacks before saving checkpoints to ensure that their state is preserved and correctly restored when resuming training.

Suggested Reviewers

@muellerz @amyeroberts @SunMarc

brs-pt avatar Jul 20 '24 23:07 brs-pt

Gentle ping @muellerzr

amyeroberts avatar Aug 20 '24 09:08 amyeroberts

@pedrobrs can you rebase from main? This should fix the failing tests

muellerzr avatar Aug 21 '24 17:08 muellerzr

looks like this logic broke some of the tests in test_trainer.py:

 tests/trainer/test_trainer_callback.py::TrainerCallbackTest::test_stateful_duplicate_callbacks

Looks like this is because this change doesn't actually wind up restoring the state?

muellerzr avatar Aug 21 '24 18:08 muellerzr

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

I saw that error when pushing the first time, I'll investigate the issue to identify and fix the problem. There is at least one other test failed t(marked as required) that I don't fully understand. The error occurs in step_and_quality / check_repository_consistency. The error message is:

RuntimeError: Failed to import transformers.models.audio_spectrogram_transformer.feature_extraction_audio_spectrogram_transformer because of the following error (look up to see its traceback): libtorch_cuda.so: cannot open shared object file: No such file or directory

I may not be able to resolve this issue on my own. What should I do about it?

brs-pt avatar Aug 21 '24 19:08 brs-pt

This one is unrelated to you and was already fixed on main!

ArthurZucker avatar Aug 22 '24 13:08 ArthurZucker

cc @ArthurZucker for final 🚀

muellerzr avatar Aug 23 '24 15:08 muellerzr