WeightStandardization
WeightStandardization copied to clipboard
changing the position of epsilon
Hi, siyuan:
According to the discussion of this article https://zhuanlan.zhihu.com/p/91926094 and the post-discussion in this article between Feng Wang and the authors, it is suggested to implement the WS by changing the position of epsilon. As suggested by Feng Wang, epsilon should be inside the sqrt() function of std. Or you can choose a little bit complicated solution by introducing epsilon-shifted L2 regularizer as proposed in https://arxiv.org/pdf/1911.05920.pdf .
Thanks for the suggestions!
We've also encountered some NaNs in other experiments we later have. We will update this repo when doing the next major update.
I'm reading the Understanding the Disharmony paper. Very interesting and solid work!
Thanks.
Yes definitely, epsilon should be inside the sqrt() bracket. torch.sqrt is a major source of NaN