transformers
transformers copied to clipboard
Add Trainer support for ReduceLROnPlateau
What does this PR do?
This PR solves #16503 by adding support to pytorch's ReduceLROnPlateau to Trainer
.
It does so by adding a new REDUCE_ON_PLATEAU
field to SchedulerType
and a new reduce_lr_on_plateau_args
parameter to TrainingArguments
that is parsed at initialization to avoid adding 9 new individual arguments. The scheduler re-uses the metric stored in metric_for_best_model
, and is delayed to run after evaluation since it requires metrics to be populated.
I'm not sure whether it is due to the complexity of Trainer
, my lack of experience (this is my first PR to a large project) or the uniqueness of ReduceLROnPlateau
compared to other schedulers, but this PR feels a bit hacky, so I welcome any feedback.
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [x] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Looking at #16503, I believe this is for @sgugger.
The documentation is not available anymore as the PR was closed or merged.
Thanks for the review! I believe this should do it. There isn't much in the way of default arguments, but ReduceLROnPlateau
is quite different from other schedulers in the first place.
Hi @pie3636 and @sgugger ,
Thanks for this PR !
I would like to use the ReduceLROnPlateau scheduler but I don't understand which parameters (patience; factor; cooldown) it has by default and where I can change them.
If I'm right it uses the default ones :
lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.2, patience=5, cooldown=2)
and if I want to test new ones I have to build my own scheduler and pass it to hf?
Thanks a lot for this new feature !
@lombardata The values you are referring to (factor=0.2, patience=5, cooldown=2
) are example values used in the unit test.
The actual default parameters are the ones provided by pytorch, which are, as I'm writing these lines, optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08, verbose=False
.
If you want to use different values, you indeed need to build your own scheduler and optimizer, and pass them as a tuple to the Trainer
class using the optimizers
argument.
Ok, thank you very much @pie3636 for the clarification.
If anyone read this post and is interested in how to pass a custom reducelronplateau scheduler to the Trainer, here is a simple way to do it :
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01, weight_decay = 0.0001)
lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True)
trainer = Trainer(
model=model,
args=training_args,
data_collator=collate_fn,
train_dataset=prepared_ds["train"],
eval_dataset=prepared_ds["validation"],
tokenizer=feature_extractor,
compute_metrics=compute_metrics,
optimizers=(optimizer, lr_scheduler)
)