datablations icon indicating copy to clipboard operation
datablations copied to clipboard

Validation loss vs model size per step

Open orena1 opened this issue 8 months ago • 3 comments

Hi @Muennighoff Great paper, very impressive work and very detailed - thanks for releasing the data! I wonder about a small discrepancy that I see between your work and scaling rules. I replotted the data in figure 15 for 1 epoch, all 3 models on 1 plot:

Image

You can see in Scaling Rules image that more parameters converge faster and have better loss. But in your experiments it seems that the 9B paraments model behave differently What are your thoughts about it? Thanks!

orena1 avatar Feb 20 '25 03:02 orena1