datablations Validation loss vs model size per step

Validation loss vs model size per step

Open orena1 opened this issue 8 months ago • 3 comments

Hi @Muennighoff Great paper, very impressive work and very detailed - thanks for releasing the data! I wonder about a small discrepancy that I see between your work and scaling rules. I replotted the data in figure 15 for 1 epoch, all 3 models on 1 plot:

You can see in Scaling Rules image that more parameters converge faster and have better loss. But in your experiments it seems that the 9B paraments model behave differently What are your thoughts about it? Thanks!

Feb 20 '25 03:02 orena1

datablations datablations copied to clipboard

Validation loss vs model size per step

datablations
datablations copied to clipboard