nanoGPT
nanoGPT copied to clipboard
Is there a big difference in the model quality between a loss of 3.0 and 2.9?
I am training on a single GPU, and the loss gets ~3.1 after ~24 hours of training, it looks like now the loss is decreasing very slowly. Just out of curiosity, is there a big difference in the model quality between a loss of 3.1, 3.0 and 2.9? Thanks! Training for another few days may reduce the loss down to ~2.9 (haven't tried yet). Is it worth it?
At a eval loss ~3.09, I tried a few context completion task, described in the GPT-2 paper, but the completion quality is not great. It's okay but not as good as the GPT-2 paper. Many texts do not make that much sense.
Yes, big difference for any additional loss, the farther and farther you get in training.
The first few amounts of loss are just the most boring things, like learning that sentences end with ".", and that spaces are important. All the interesting stuff gets learned much later and only yields a tiny amount of loss relatively speaking.
Ah I see, thank you very much!
@vesuppi which GPU are you using? Also, could you share your loss curves? I'm trying to pretrain on a single RTX 4090.