addons Group normalization documentation is incorrect

Group normalization documentation is incorrect

Open albertz opened this issue 5 years ago • 12 comments

Describe the bug

This is purely about the documentation.

In the documentation about group normalization, it is stated:

Relation to Layer Normalization: If the number of groups is set to 1, then this operation becomes identical to Layer Normalization.

However, that is not true.

Assume an input tensor x of shape [B,T,F] (batch, time, feature-dim) (time could also be H/W instead; feature-dim can also be the channels).

In layer normalization, the mean you calculate is:

mean = reduce_mean(x, axis=-1, keepdims=True)  # shape [B,T,1]

You normalize just over the feature axis.

In group normalization with G=1 (ignore the group shape then), the mean you calculate is:

mean = reduce_mean(x, axis=[1,2], keepdims=True)  # shape [B,1,1]

You normalize over all axes except the batch axis and the newly added group axis (doesn't matter if G=1).

Or do I misunderstand sth? I wonder because the same wrong statement is in the original group-normalization paper.

The figure from the paper (also here) is also misleading: In this figure, it looks like layer-normalization normalizes over H/W as well. But this is not the case (at least commonly, and also with the default options). So, this figure is wrong about layer-normalization (it would just normalize over C, not H/W). But the figure is correct for group-normalization as you have implemented it (it normalizes over all axes except N/G).

I also formulated the question here.

Sep 02 '20 11:09 albertz

addons addons copied to clipboard

Group normalization documentation is incorrect

addons
addons copied to clipboard