returnn icon indicating copy to clipboard operation
returnn copied to clipboard

Generalized constraints: post update hooks

Open albertz opened this issue 3 years ago • 0 comments

Currently our implemented constraints are:

  • L2 on weights (L2 option on a layer)
  • Some exotic things on activations (darc1, spatial_smoothing)

We already have the possibility to decouple the constraints from the normal loss computation, via decouple_constraints. In https://github.com/rwth-i6/returnn/pull/1206, this behavior will change a bit, and then it decouples only the data-independent constraints, i.e. namely only L2 currently.

L2 is equivalent to weight decay when SGD is used. With the new decoupled constraints code (#1206), it explicitly does:

                return var.assign_sub(var * (l2 * 2.), use_locking=self.use_locking, read_value=False)

We can generalize such updates, and allow the user to perform some generic post updates on parameters.

For example, in https://github.com/rwth-i6/returnn_common/issues/241 it was suggested to extend L2 to have some decay_center. But instead of having such a L2-specific additional option, we can allow the user to perform any custom post updates, similar as the code above. Then the user could easily do such delay_center logic, but also many other things as well.

Also related: https://github.com/rwth-i6/returnn_common/issues/90

How would the API look like on RETURNN side? It's maybe also ok to only do this for the VariableLayer.

albertz avatar Nov 14 '22 09:11 albertz