returnn
returnn copied to clipboard
Introduce native assign_mul for more efficient weight decay
This becomes relevant for efficient decoupled weight decay implementation. If it is not decoupled, it's inefficient anyway.
Note, for relevant code: https://github.com/tensorflow/tensorflow/blob/9959f963a0afe0a5a24cb9913998fe89169df252/tensorflow/core/kernels/resource_variable_ops.cc#L630 https://github.com/tensorflow/tensorflow/blob/27dc409fcfcc538cce7447b9637a8f727ef6a123/tensorflow/core/ops/resource_variable_ops.cc#L217
Note that our assign_mul
should support broadcasting, or rather even supporting a scalar as argument. This is actually the only relevant use case for us, so it's ok if we only implement that.