Applying Weight Normalization to Single Vector Settings
Hey Tim,
Thanks for publishing this code and it has been very useful for implementation purposes.
I wanted to ask you about applying to weight norm to single vector settings.
For example, in parametric relu, could you apply weight normalization? In this case, alpha is a 1d vector, so you would have to take the norm of the 1d vector and calculate only one scalar g for the entire vector alpha.
Another example of this is within multiplicative integration, where there are coefficients multiplied by the weight matrix. https://arxiv.org/abs/1606.06630
The idea here is there are coefficients Alpha and Beta that are element-wise multiplied by the weight matrices. Would it be advantageous to apply weight normalization to these learnable parameters?
I really like weight norm compared to BN because of its simplicity, so I'm hoping to maximize its benefits.
Yes, I think weight norm should also work well in those cases. For multiplicative integration you will probably want to not use an additional scale parameter 'g' for the weight matrices, but instead only use the alpha and beta scale parameters of that method.
Thanks @TimSalimans I will not use the g scale parameter. So to be clear, I should still divide by the euclidean norm of the weight matrix (neuron-wise)?