returnn icon indicating copy to clipboard operation
returnn copied to clipboard

Removing redundant lerning_rate(s) parameter

Open JackTemaki opened this issue 3 years ago • 8 comments

Discussion related to #508

Right now there are both a learning_rate and a learning_rates parameter, which often causes confusion to the users. Here I am not sure what would be the better solution (which one to keep), but the current situation is not good.

JackTemaki avatar May 21 '21 08:05 JackTemaki

They are not exactly redundant or the same. learning_rate defines the initial and default learning rate.

learing_rates sets predefined learning rates for some specified epochs.

albertz avatar May 21 '21 08:05 albertz

can the different behaviours be merged into a single variable? maybe something like: learning_rates = {"initial": FLOAT, "default": FLOAT, "predefined": LIST}

christophmluscher avatar May 21 '21 09:05 christophmluscher

Currently learning_rates is either a list or lrs or dict epoch -> lr (so a sparse list or lrs). I'm not sure if extending/changing learning_rates to also cover initial/default lr is really in any way helpful (or easier) for the user than to just specify it separately. I personally think learning_rate as a separate option is more clean.

albertz avatar May 21 '21 09:05 albertz

I think @JackTemaki is right in the way, when learning_rates also specifies the LR of epoch 1. However, I don't see this too much of a problem. And this also does not necessarily need to be the case.

Maybe there can be an extra check (but independent from new behavior) whether learning_rate has no effect and is different from max(learning_rates.values()) or so, and then just print a warning.

I think I would still leave learning_rate as a config option.

albertz avatar May 21 '21 09:05 albertz

But then we are still left with learning_rate and learning_rates, @JackTemaki wanted to change exactly that. Merging this into a single variable would not take away your options but mainly increase readability since the keywords add information. That was my reasoing..

christophmluscher avatar May 21 '21 09:05 christophmluscher

But then we are still left with learning_rate and learning_rates,

Yes, this is what I meant. I would prefer to keep both.

Merging this into a single variable would not take away your options but mainly increase readability since the keywords add information.

But what new information? It's exactly the same information, just expressed in a different way (which I think is worse than before). And you even have still the same ambiguity problem, like:

learning_rates = {"initial": 1.0, "default": 0.5, 1: 0.3}

When does it use 'initial', 'default' or the one from epoch 1?

Actually I would also need to read the code to understand which is used in which case exactly. I'm not really sure now, when learning_rates specifies epoch 1, if learning_rate is ever used.


Another suggestion:

I would still keep both options, but only allow one of them by the user. If learning_rates is used, it must define the LR of the first epoch.

albertz avatar May 21 '21 09:05 albertz

I would still keep both options, but only allow one of them by the user. If learning_rates is used, it must define the LR of the first epoch.

This sounds like a good compromise. I was not even aware that learning_rates can have this kind of dict syntax, I guess this should at some point be updated in the docs as well.

JackTemaki avatar May 21 '21 09:05 JackTemaki

Yes, this is what I meant. I would prefer to keep both.

OK.

When does it use 'initial', 'default' or the one from epoch 1?

Good point.

I would still keep both options, but only allow one of them by the user. If learning_rates is used, it must define the LR of the first epoch.

I like this.

christophmluscher avatar May 21 '21 09:05 christophmluscher