XNOR-Net-PyTorch
XNOR-Net-PyTorch copied to clipboard
about your gradient
首先,非常感谢您开源了您的XNOR-pytorch代码。其次,我注意到您在更新单精度权重时,对于权重的梯度乘了一些系数:
self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)
self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)
关于这些系数,我没有在原文中找到相应的描述,想问一下您为什么对梯度进行了这样的变换。
我也想知道,楼主如果明白了,麻烦给讲解一下,谢谢
Hi @BobxmuMa @zhaoxiangshun , this parameter 1e+9
appears in the paper author's initial repo and, therefore, I also kept it. The main effect of this parameter is to increase the range of the weights and reduce the effect of weight decay. I suppose using a much smaller weight decay value will have the same effect. I also tested the accuracy with and without this parameter. In my tests, I saw a higher accuracy if using this parameter.