torchscale
torchscale copied to clipboard
[Minor issue] Discrepancy inside arxiv paper
Hi, first of all thank you for the nice work.
I was reading the paper and found the weight decay mentioned in the appendix is different from the one mentioned in the main body.
https://arxiv.org/pdf/2307.08621.pdf
As the weight decay between 0.01 and 0.05 is a quite huge gap, maybe need to double check and make them consistent? Or are they configs for different models?