Generalized constraints: post update hooks
Currently our implemented constraints are:
- L2 on weights (
L2option on a layer) - Some exotic things on activations (
darc1,spatial_smoothing)
We already have the possibility to decouple the constraints from the normal loss computation, via decouple_constraints. In https://github.com/rwth-i6/returnn/pull/1206, this behavior will change a bit, and then it decouples only the data-independent constraints, i.e. namely only L2 currently.
L2 is equivalent to weight decay when SGD is used. With the new decoupled constraints code (#1206), it explicitly does:
return var.assign_sub(var * (l2 * 2.), use_locking=self.use_locking, read_value=False)
We can generalize such updates, and allow the user to perform some generic post updates on parameters.
For example, in https://github.com/rwth-i6/returnn_common/issues/241 it was suggested to extend L2 to have some decay_center. But instead of having such a L2-specific additional option, we can allow the user to perform any custom post updates, similar as the code above. Then the user could easily do such delay_center logic, but also many other things as well.
Also related: https://github.com/rwth-i6/returnn_common/issues/90
How would the API look like on RETURNN side? It's maybe also ok to only do this for the VariableLayer.