nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Training loss converges much earlier compared to max_iters

Open goswamig opened this issue 1 year ago • 1 comments

In train.py max_iters is set to 600000 however the loss gets close 2.8 much earlier like 300000 iter and fluctuates a bit there. I wonder if can do early stop here and save checkpoint?

iter 351054: loss 2.7325, time 107.20ms, mfu 31.66%
iter 351055: loss 2.8947, time 105.83ms, mfu 31.68%

Should we add early stop @karpathy in train.py ?

goswamig avatar Mar 21 '24 04:03 goswamig

thats just an arbitrary number, it can be whatever you want. Checkpoints save all the time, you can add a stop for a certain loss, or you can wait for fluctuations, or a lower loss, whatever. Theres no real need for this?

VatsaDev avatar Mar 22 '24 22:03 VatsaDev