nanoGPT
nanoGPT copied to clipboard
Training loss converges much earlier compared to max_iters
In train.py max_iters is set to 600000 however the loss gets close 2.8 much earlier like 300000 iter and fluctuates a bit there. I wonder if can do early stop here and save checkpoint?
iter 351054: loss 2.7325, time 107.20ms, mfu 31.66%
iter 351055: loss 2.8947, time 105.83ms, mfu 31.68%
Should we add early stop @karpathy in train.py ?
thats just an arbitrary number, it can be whatever you want. Checkpoints save all the time, you can add a stop for a certain loss, or you can wait for fluctuations, or a lower loss, whatever. Theres no real need for this?