DeepRec icon indicating copy to clipboard operation
DeepRec copied to clipboard

Masknet replace batchnorm with layernorm

Open zippeurfou opened this issue 1 year ago • 6 comments

The paper from masknet uses layernorm. however the code implementation uses batchnorm.

zippeurfou avatar Apr 13 '23 23:04 zippeurfou

Hi Marc,

Thanks for bringing this up! This is indeed a bug, and we are fixing it.

StevenShi-23 avatar Apr 17 '23 07:04 StevenShi-23

Hi Marc,

Upon checking, this is not a bug. When applying BatchNorm on the default axis (last dim), BatchNorm reduces to LayerNorm, and since the size of gamma/beta depends on the shape of input tensor, the original implementation is still correct.

However, for the clarity of the code, we updated the example (ref PR #816 ).

Thanks for the comment!

StevenShi-23 avatar Apr 18 '23 11:04 StevenShi-23

I am not sure I am following see this screenshot. Screenshot 2023-04-18 at 11 41 50 AM What am I missing?

zippeurfou avatar Apr 18 '23 15:04 zippeurfou

Because your code isn't in trianing.

tf.layers.batch_normalization() will call to class BatchNormalizationBase https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L43 tf.keras.layers.LayerNormalization() will call to class LayerNormalization https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L898

In LayerNormalization, mean and var are computed by nn.moments https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L1025 then use nn.batch_normalization to get the result. https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L1040-L1046

It is the same with BN without other features. https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L643-L652 https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L736-L739 https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L820-L825

But the difference is that when you are not in training, the mean and var of BN will be replaced. https://github.com/DeepRec-AI/DeepRec/blob/6bd822e4d05c6b2a005e58342c7661c387b417cb/tensorflow/python/keras/layers/normalization.py#L744-L750

Duyi-Wang avatar Apr 19 '23 02:04 Duyi-Wang

you can add input param moving_mean_initializer='ones' which is defaulted to 'zeros' and find output is changed.

Duyi-Wang avatar Apr 19 '23 03:04 Duyi-Wang

Thanks @Duyi-Wang it makes sense. I was confused by it as well but the doc clearly state it. Thanks for pointing out the code. Adding a screenshot for posterity. Screenshot 2023-04-19 at 10 45 40 AM Feel free to close this one.

zippeurfou avatar Apr 19 '23 03:04 zippeurfou