returnn icon indicating copy to clipboard operation
returnn copied to clipboard

learning_rate_control_error_measure should match exactly

Open albertz opened this issue 2 years ago • 3 comments

Currently when learning_rate_control_error_measure (the error key) does not match exactly, it will use some heuristics. See LearningRateControl.get_error_key.

This is also because the error key can change depending on whether there is a single loss or multiple, see LearningRateControl.set_epoch_error.

This behavior can potentially hide user mistakes like providing an invalid error key.

This was already mentioned in https://github.com/rwth-i6/returnn_common/issues/211 where an option like learning_rate_control_error_measure_must_match was suggested to enforce an exact match. However, after some discussion with @mmz33, we think that having a new behavior version might be a another, maybe a better option instead. What do you think @JackTemaki @patrick-wilken @michelwi?

Arguments for new behavior version:

  • It's probably the behavior everyone should use. It is really the more reasonable default. The current behavior is really unexpected.

Arguments against new behavior version:

  • This requires that basically everyone needs to change the setup, in a non-trivial way. It's not just the key which needs an update. It might also require changes in other parts of their pipeline. We argued before that we don't want to introduce such things even with new behavior version because then people might hesitate to switch to the new behavior version. See the discussion in #508.

Such new behavior (either via new behavior version, or via such option) would need to change both get_error_key and set_epoch_error.

albertz avatar Oct 14 '22 12:10 albertz

No feedback here?

We can also first introduce such an option and later change the default of this option via a new behavior version. Or maybe also right away, and then people can either disable it to get the old behavior, or adapt it.

albertz avatar Nov 15 '22 09:11 albertz

Despite this, there is also the inconsistency of a single vs multiple losses. In the single case, it would just store dev_score etc, and only with multiple losses it appends the loss name. I think this option should also enable to always store the full name.

albertz avatar Nov 15 '22 09:11 albertz

I see that storing the full name in all cases would require a new option / behaviour version. But with the current loss names, can't we already add a check that if learning_rate_control_error_measure is set it should match one of the losses? (Maybe optionally including an "_output" postfix, that's currently in the code.) That is a bug fix rather than a new behaviour, right? Nobody sets this option and expects it to be ignored.

patrick-wilken avatar Mar 29 '23 16:03 patrick-wilken