WeightStandardization icon indicating copy to clipboard operation
WeightStandardization copied to clipboard

changing the position of epsilon

Open implus opened this issue 5 years ago • 2 comments

Hi, siyuan:

According to the discussion of this article https://zhuanlan.zhihu.com/p/91926094 and the post-discussion in this article between Feng Wang and the authors, it is suggested to implement the WS by changing the position of epsilon. As suggested by Feng Wang, epsilon should be inside the sqrt() function of std. Or you can choose a little bit complicated solution by introducing epsilon-shifted L2 regularizer as proposed in https://arxiv.org/pdf/1911.05920.pdf .

implus avatar Nov 15 '19 06:11 implus

Thanks for the suggestions!

We've also encountered some NaNs in other experiments we later have. We will update this repo when doing the next major update.

I'm reading the Understanding the Disharmony paper. Very interesting and solid work!

Thanks.

joe-siyuan-qiao avatar Nov 15 '19 16:11 joe-siyuan-qiao

Yes definitely, epsilon should be inside the sqrt() bracket. torch.sqrt is a major source of NaN

MohitLamba94 avatar Sep 03 '21 06:09 MohitLamba94