ReCU
ReCU copied to clipboard
a little question about information entropy
i am wondering in your paper why use latent full precision weights to calculate information entropy rather than binarized weights? It seems make no sense considering latent weights.
Hello XA23i,
I'm not the author of this research, but I believe the reasoning behind it is related to the binarization of weights using the sign function. This process ensures that the weights maintain a specific statistical distribution throughout training. According to the authors, this final distribution of the weights follows a Laplacian pattern. Then, calculating the information entropy allows us to manipulate that distribution to achieve a higher or lower entropy.