LearningRateFinder during training

Open X-Bruce-Y opened this issue 1 year ago • 0 comments

Description & Motivation

LearningRateFinder callback is cool to find a learning rate that is potential to reduce training loss from the start. However, in my case since epoch 2, learning rates need to decreased a lot promptly, otherwise the loss would not drop any more. That makes me wonder why the technique is not extended to the whole process of training. Though that would take longer (presumbaly < 2x) for each epoch, the overall efficacy should be expected to improve.

Pitch

Extend the LearningRateFinder callback so that it can be called during the whole training process, with user-defined interval (probably using epoch as unit), start_epoch, end_epoch, etc.

Alternatives

Currently it's possible to train one epoch, save the checkpoint, re-initiate the model with the latest epoch, use the callback to find optimal learning rate, and do it all over again. However, it's so indirect and slows training down too much.

Additional context

No response

cc @borda

Jun 24 '24 08:06 X-Bruce-Y