Knet.jl setting different optimization parameters for different layers / different epochs

setting different optimization parameters for different layers / different epochs

Open denizyuret opened this issue 5 years ago • 5 comments

Current design:

parameters have an opt field specifying the optimization algorithm/options.
by default these are initialized as nothing.
call to a minimizer (sgd, adam etc.) sets the opt fields to specified optimizer if nothing.
however, they do not touch the opt field if already set.
this gives the user the freedom to set the optimization algorithm/options differently for different layers if needed.
it also supports stop/start of training without changing the optimization algorithm/options/state.
however if the user wants to say reduce the learning rate for the tenth epoch and calls sgd with a new learning rate, this is silently ignored right now because sgd sees the opt fields already set and does not change them, using the initially specified learning rates.

Since reducing learning rate is a frequent practice the current design needs to change.

Mar 25 '19 12:03 denizyuret

Currently, I'd like to change the learning rate. Is creating a new model and copying the old params over for training with the new learning rate the correct approach?

Apr 02 '19 11:04 ngphuoc

You don’t need to copy them. This is the way I apply learning rate decay:

https://github.com/ekinakyurek/Morse.jl/blob/a9190261d480b56a9f97c66ece8193602106b315/src/util.jl#L84

On Tue, Apr 2, 2019 at 7:19 AM ngphuoc [email protected] wrote:

Currently, I'd like to change the learning rate. Is creating a new model and copying the old params over for training with the new learning rate the only approach?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/449#issuecomment-478952467, or mute the thread https://github.com/notifications/unsubscribe-auth/AOpr8dU2tzzkrViEHzwBjxkPBpAxWPkuks5vczzRgaJpZM4cGy3_ .

Apr 02 '19 11:04 ekinakyurek

Thanks! That would be simple enough given I am having a callback at every specified epochs/step in my train! function.

Apr 02 '19 12:04 ngphuoc

This is related to issue 438 update! seems not working on truncted weights/gradients

Apr 15 '19 07:04 s271

It would be ideal to be able to pass a function as the lr parameter with the form lr = f(epoch).

Alternatively, the model darknet uses where a learning rate decay can be specified by a shape (e.g. exponential, polynomial, etc.) and some shape parameters (e.g. power law index) could work. Here are some examples:

learning_rate=0.01
policy=poly
power=4

learning_rate=0.1
policy=steps
steps=1000,1500
scales=.1,.1

May 01 '19 14:05 davidssmith

Knet.jl Knet.jl copied to clipboard

setting different optimization parameters for different layers / different epochs

Knet.jl
Knet.jl copied to clipboard