transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add Trainer support for ReduceLROnPlateau

Open pie3636 opened this issue 1 year ago • 1 comments

What does this PR do?

This PR solves #16503 by adding support to pytorch's ReduceLROnPlateau to Trainer.

It does so by adding a new REDUCE_ON_PLATEAU field to SchedulerType and a new reduce_lr_on_plateau_args parameter to TrainingArguments that is parsed at initialization to avoid adding 9 new individual arguments. The scheduler re-uses the metric stored in metric_for_best_model, and is delayed to run after evaluation since it requires metrics to be populated.

I'm not sure whether it is due to the complexity of Trainer, my lack of experience (this is my first PR to a large project) or the uniqueness of ReduceLROnPlateau compared to other schedulers, but this PR feels a bit hacky, so I welcome any feedback.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [x] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Looking at #16503, I believe this is for @sgugger.

pie3636 avatar Apr 26 '23 15:04 pie3636

The documentation is not available anymore as the PR was closed or merged.

Thanks for the review! I believe this should do it. There isn't much in the way of default arguments, but ReduceLROnPlateau is quite different from other schedulers in the first place.

pie3636 avatar Apr 28 '23 09:04 pie3636

Hi @pie3636 and @sgugger , Thanks for this PR ! I would like to use the ReduceLROnPlateau scheduler but I don't understand which parameters (patience; factor; cooldown) it has by default and where I can change them. If I'm right it uses the default ones : lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.2, patience=5, cooldown=2) and if I want to test new ones I have to build my own scheduler and pass it to hf? Thanks a lot for this new feature !

lombardata avatar Oct 27 '23 09:10 lombardata

@lombardata The values you are referring to (factor=0.2, patience=5, cooldown=2) are example values used in the unit test. The actual default parameters are the ones provided by pytorch, which are, as I'm writing these lines, optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08, verbose=False.

If you want to use different values, you indeed need to build your own scheduler and optimizer, and pass them as a tuple to the Trainer class using the optimizers argument.

pie3636 avatar Nov 06 '23 21:11 pie3636

Ok, thank you very much @pie3636 for the clarification. If anyone read this post and is interested in how to pass a custom reducelronplateau scheduler to the Trainer, here is a simple way to do it : optimizer = torch.optim.Adam(model.parameters(), lr = 0.01, weight_decay = 0.0001) lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    train_dataset=prepared_ds["train"],
    eval_dataset=prepared_ds["validation"],
    tokenizer=feature_extractor,
    compute_metrics=compute_metrics,
    optimizers=(optimizer, lr_scheduler)
)

lombardata avatar Nov 07 '23 08:11 lombardata