transformers icon indicating copy to clipboard operation
transformers copied to clipboard

New option called `"best"` for `args.save_strategy`.

Open seanswyi opened this issue 7 months ago • 6 comments

What does this PR do?

Addresses https://github.com/huggingface/transformers/issues/31626.

Adds a new option called "best" for TrainingArguments.save_strategy which saves the model checkpoint each time a new best performance is achieved.

Details

  1. The previous _save_checkpoint method was in charge of not only saving the model checkpoint but also determining the best metric and best checkpoint. The logic for determining a new best metric was separated out into the _determine_best_metric method.
  2. _determine_best_metric is called after every evaluation inside of _maybe_log_save_evaluate. The return value new_best_metric is used to determine whether or not a new best metric has been achieved, and if the save strategy is "best" then the TrainerControl's should_save flag is switched on.
    • Contrary to what I initially thought, best_metric does not seem to be tracked by default. Rather, it's only tracked when args.metric_for_best_model is provided. I believe that a best metric of some sort should always be tracked, and therefore if a value is not provided then the validation loss is used to determine a new best.
  3. A new object called SaveStrategy was created in trainer_utils that adds a new attribute called BEST to the previous IntervalStrategy.

I'm not sure if I like the rather "hack-y" way that I implemented this by manually switching the TrainerControl's should_save flag rather than delegating it to the callback handler like the other flags are dealt with. The problem is that the flags are normally updated before calling _maybe_log_save_evaluate inside of the inner training loop, which means there's no way for us to determine whether or not a new best metric has been achieved with the current logic. I'm not sure if I'm making sense, but I'm open to any other suggestions.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

@muellerzr @SunMarc

seanswyi avatar Jul 06 '24 13:07 seanswyi