returnn
returnn copied to clipboard
learning_rate_control_error_measure should match exactly
Currently when learning_rate_control_error_measure
(the error key) does not match exactly, it will use some heuristics. See LearningRateControl.get_error_key
.
This is also because the error key can change depending on whether there is a single loss or multiple, see LearningRateControl.set_epoch_error
.
This behavior can potentially hide user mistakes like providing an invalid error key.
This was already mentioned in https://github.com/rwth-i6/returnn_common/issues/211 where an option like learning_rate_control_error_measure_must_match
was suggested to enforce an exact match. However, after some discussion with @mmz33, we think that having a new behavior version might be a another, maybe a better option instead. What do you think @JackTemaki @patrick-wilken @michelwi?
Arguments for new behavior version:
- It's probably the behavior everyone should use. It is really the more reasonable default. The current behavior is really unexpected.
Arguments against new behavior version:
- This requires that basically everyone needs to change the setup, in a non-trivial way. It's not just the key which needs an update. It might also require changes in other parts of their pipeline. We argued before that we don't want to introduce such things even with new behavior version because then people might hesitate to switch to the new behavior version. See the discussion in #508.
Such new behavior (either via new behavior version, or via such option) would need to change both get_error_key
and set_epoch_error
.
No feedback here?
We can also first introduce such an option and later change the default of this option via a new behavior version. Or maybe also right away, and then people can either disable it to get the old behavior, or adapt it.
Despite this, there is also the inconsistency of a single vs multiple losses. In the single case, it would just store dev_score
etc, and only with multiple losses it appends the loss name. I think this option should also enable to always store the full name.
I see that storing the full name in all cases would require a new option / behaviour version. But with the current loss names, can't we already add a check that if learning_rate_control_error_measure
is set it should match one of the losses? (Maybe optionally including an "_output" postfix, that's currently in the code.) That is a bug fix rather than a new behaviour, right? Nobody sets this option and expects it to be ignored.