neuralforecast
neuralforecast copied to clipboard
[FEAT] Add option to support user defined learning rate scheduler for NeuralForecast Models
Rationale
- Similar to the work of adding the support of an user-defined optimizer, now we can accept an user defined scheduler. This feature is asked in https://github.com/Nixtla/neuralforecast/issues/852#issuecomment-1961923861 and https://nixtlacommunity.slack.com/archives/C031M8RLC66/p1713516602654109
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@jmoralez Please review. Hope this might be useful to the community too.
@JQGoh Thanks for your work, the PR looks good to me. @cchallu @jmoralez I'm happy to add this as feature to neuralforecast, wdyt?
@jmoralez @cchallu If you have some time, appreciate your review on this. As I notice recently there has been new models added to neuralforecast, would love to get this merged (subject to approval/reviewed) and later the other newly implemented models shall consider these arguments
Can you please add warnings if the user provides lr_scheduler_kwargs
but doesn't provide lr_scheduler
(that the kwargs will be ignored)? We got a similar request for the optimizer by the sktime folks.
Can you please add warnings if the user provides
lr_scheduler_kwargs
but doesn't providelr_scheduler
(that the kwargs will be ignored)? We got a similar request for the optimizer by the sktime folks.
@jmoralez Sounds good to me. I shall add warnings for
- user provides
lr_scheduler_kwargs
but doesn't providelr_scheduler
- user provides
optimizer_kwargs
but doesn't provideoptimizer
.
If you have the link/reference to the ask by sktime folks, can mention in this PR too. Thanks
https://github.com/sktime/sktime/pull/6235#issue-2216009123
I was facing the same issue when I wanted to use a custom scheduler and you have a solution that is pretty much what I have, the only thing that I am missing in it is that I think we should also be able to change the other arguments of "lr_scheduler_config" like the frequency or the interval (https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#configure-optimizers for a list of other parameters). My solution was in fact to add 2 new kwargs, lr_scheduler_kwargs
as you have and lr_scheduler_config
as expected by lightning, only adding the actual scheduler to it in configure_optimizers
with lr_scheduler_config['scheduler'] = self.lr_scheduler(optimizer=optimizer, **lr_scheduler_kwargs)
. I think that if someone is bothering to manually change the lr_scheduler it makes sense to give them full control over it.
I was facing the same issue when I wanted to use a custom scheduler and you have a solution that is pretty much what I have, the only thing that I am missing in it is that I think we should also be able to change the other arguments of "lr_scheduler_config" like the frequency or the interval (https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#configure-optimizers for a list of other parameters). My solution was in fact to add 2 new kwargs,
lr_scheduler_kwargs
as you have andlr_scheduler_config
as expected by lightning, only adding the actual scheduler to it inconfigure_optimizers
withlr_scheduler_config['scheduler'] = self.lr_scheduler(optimizer=optimizer, **lr_scheduler_kwargs)
. I think that if someone is bothering to manually change the lr_scheduler it makes sense to give them full control over it.
@BrunoBelucci
Thanks for your suggestion. I lean toward allowing the users to have more freedom in customizing the behaviour, including frequency
.
On top of the prepared PR, think I may need an additional arg called lr_scheduler_config
that represents the set of arguments as detailed in https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#configure-optimizers, but it shall ignore the "scheduler" (since we will specify it separately, plus we can ensure that lr_scheduler using the same optimizer).
If Nixtla team agrees that lr_scheduler_config
is a nice to have feature, I can prepare this in the next PR.
cc: @jmoralez
I'm not really a fan of adding so many arguments to all models. Would it be possible to instead provide a function that overrides the configure_optimizers method? Either by calling it or monkey patching.
Honestly, I think that in this case, the cleanest solution is to add one more argument. We already have arguments for the trainer, optimizer, and scheduler. Introducing a different approach for configuring the scheduler, which is part of the same system, could confuse users and require them to write additional boilerplate code to achieve their goals. Here are some scenarios to further develop my point:
-
Using the CosineAnnealingLR Scheduler The default scheduler configuration is fine because the scheduler is not affected by it.
-
Using the OneCycleLR Scheduler The default scheduler configuration is fine if the user knows that the default is
{'frequency': 1, 'interval': 'step'}
. They would only need to calculate the number of steps to pass astotal_steps
in thelr_scheduler_kwargs
. However, if they do not know this, they would need to check the code (since it is not documented) to ensure the model trains as intended. -
Using the ReduceLROnPlateau Scheduler In this case, the user needs to change the scheduler configuration because, typically, we want to identify a plateau by epoch rather than by step. Additionally, the user has to specify the
"monitor"
metric they want. Without the ability to pass the scheduler configuration directly, users would need to use one approach to pass the scheduler and its kwargs and another approach (such as calling another function or monkey patching) to overwrite the scheduler configuration.
I see no reason for users to have to handle each scenario above differently. Additionally, this approach would allow us to document the default behavior clearly.
What I mean is that we currently have two arguments (for the optimizer), this adds two more (for the scheduler) and you're proposing adding a fifth, when all of these are going to be used in the same method (configure_optimizers
) and they all could've been just one where the user passes a callable that takes the model parameters and returns what pytorch lightning expects from the configure_optimizers
method.
Given that sktime is already using the optimizer arguments it'd be a bad experience to deprecate them and introduce a new one that takes a function, so I think we should move forward with adding more to not deprecate something we recently introduced, I just wish we'd done the single argument approach from the start.
What I mean is that we currently have two arguments (for the optimizer), this adds two more (for the scheduler) and you're proposing adding a fifth, when all of these are going to be used in the same method (
configure_optimizers
) and they all could've been just one where the user passes a callable that takes the model parameters and returns what pytorch lightning expects from theconfigure_optimizers
method.Given that sktime is already using the optimizer arguments it'd be a bad experience to deprecate them and introduce a new one that takes a function, so I think we should move forward with adding more to not deprecate something we recently introduced, I just wish we'd done the single argument approach from the start.
@jmoralez
I also wished that I could have implemented it differently, did not consider about the option of modifying the default configure_optimizes
behavior.
By the way, shall we re-consider this implementation and revert this PR?
I implemented the option of modifying configure_optimizers behavior via set_configure_optimizers
at BaseModel level, please check the work in
https://github.com/Nixtla/neuralforecast/pull/1015
Thank you so much for the implementation of a custom lr scheduler!
@BrunoBelucci :
Could you explain how it is possible to implement a ReduceLROnPlateau scheduler? I think some of your earlier discussion evolved around it, however, I couldn't make it work yet.
Currently I am passing through the lr_scheduler
and lr_scheduler_kwargs
parameters. However, I get the below error when implementing ReduceLROnPlateau. Other schedulers, like OneCycle, work perfectly though!
MisconfigurationException('The lr scheduler dict must include a monitor when a ReduceLROnPlateau scheduler is used. For example: {"optimizer": optimizer, "lr_scheduler": {"scheduler": scheduler, "monitor": "your_loss"}}').
Thank you so much for the implementation of a custom lr scheduler!
@BrunoBelucci : Could you explain how it is possible to implement a ReduceLROnPlateau scheduler? I think some of your earlier discussion evolved around it, however, I couldn't make it work yet. Currently I am passing through the
lr_scheduler
andlr_scheduler_kwargs
parameters. However, I get the below error when implementing ReduceLROnPlateau. Other schedulers, like OneCycle, work perfectly though!
MisconfigurationException('The lr scheduler dict must include a monitor when a ReduceLROnPlateau scheduler is used. For example: {"optimizer": optimizer, "lr_scheduler": {"scheduler": scheduler, "monitor": "your_loss"}}').
@MLfreakPy
Could you open a new issue and continue the discussions related to this topic? You could link this PR as a reference. As this PR has already addressed the original purpose, it will be more beneficial for the public users to follow the discussed topic. Thanks for the interest in this and when I have more time, will definitely check about the use of ReduceLROnPlateau
!