rezero icon indicating copy to clipboard operation
rezero copied to clipboard

weight decay for the resweight?

Open Kyeongpil opened this issue 3 years ago • 2 comments

Hello, I read the paper, and it is interesting to me. I have a question.

Many implements including Huggingface exclude LayerNorm and biases when decaying weights for convergence. (https://github.com/huggingface/transformers/issues/492) Is it helpful to exclude the resweight parameters when decaying weights??

Kyeongpil avatar Nov 24 '20 02:11 Kyeongpil

Yes, it would seem reasonable to not decay resweights since other parameters are already being decayed.

calclavia avatar Nov 28 '20 05:11 calclavia

@calclavia I have the same question, but did this prove to be better? Or is it just to speed up calculations?

fightnyy avatar Feb 17 '21 16:02 fightnyy