nanoGPT Is there a big difference in the model quality between a loss of 3.0 and 2.9?

Is there a big difference in the model quality between a loss of 3.0 and 2.9?

Open tongzhou80 opened this issue 1 year ago • 4 comments

I am training on a single GPU, and the loss gets ~3.1 after ~24 hours of training, it looks like now the loss is decreasing very slowly. Just out of curiosity, is there a big difference in the model quality between a loss of 3.1, 3.0 and 2.9? Thanks! Training for another few days may reduce the loss down to ~2.9 (haven't tried yet). Is it worth it?

At a eval loss ~3.09, I tried a few context completion task, described in the GPT-2 paper, but the completion quality is not great. It's okay but not as good as the GPT-2 paper. Many texts do not make that much sense.

Apr 23 '23 01:04 tongzhou80

Yes, big difference for any additional loss, the farther and farther you get in training.

Apr 23 '23 16:04 karpathy

The first few amounts of loss are just the most boring things, like learning that sentences end with ".", and that spaces are important. All the interesting stuff gets learned much later and only yields a tiny amount of loss relatively speaking.

Apr 23 '23 16:04 karpathy

Ah I see, thank you very much!

Apr 23 '23 20:04 tongzhou80

@vesuppi which GPU are you using? Also, could you share your loss curves? I'm trying to pretrain on a single RTX 4090.

Aug 21 '23 18:08 TahaBinhuraib

nanoGPT nanoGPT copied to clipboard

Is there a big difference in the model quality between a loss of 3.0 and 2.9?

nanoGPT
nanoGPT copied to clipboard