datablations
datablations copied to clipboard
Scaling Data-Constrained Language Models
Hi, I am reading your paper and I have noticed that figure 4 and figure 15 are exactly the same. Are they meant to be the same? I believe that...
"Scaling Data-Constrained Language Models" is a very nice paper, and I learn a lot from this paper. However, I have a question about this paper: In the abstract and Figure...
This should make it easier for us to investigate scaling laws @TevenLeScao
hi authors, thanks for the great work! i just wonder if LR=1e-3 for mup is optimal value from small-scale proxy model and how dropout is critical for multi-epoch training. for...
Hi @Muennighoff Great paper, very impressive work and very detailed - thanks for releasing the data! I wonder about a small discrepancy that I see between your work and scaling...