autoclip Interaction with learning rate schedule

Interaction with learning rate schedule

Open Permafacture opened this issue 1 year ago • 0 comments

Has there been any research on how this strategy interacts with a learning rate schedule? Especially for something extreme like the one-cycle policy (super convergence). It seems like the history of the scale of the gradient would be dominated by changes in the learning rate. I found this paper that touches on the subject but doesn't propose any theory behind or solution to the interaction between the two.

from https://hal.science/hal-03891707v1/file/Learning_rate_scheduling_and_gradient_clipping_for_audio_source_separation.pdf

As expected, AutoClip doesn't interact well with cosine annealing

Jan 16 '24 18:01 Permafacture

autoclip autoclip copied to clipboard

Interaction with learning rate schedule

autoclip
autoclip copied to clipboard