nanoGPT Why do we need further pretrain given the loss is already converged

Why do we need further pretrain given the loss is already converged

Open BiEchi opened this issue 11 months ago • 1 comments

I observe that the loss converges around 100000 steps. Why do we need to further train the model until 600000 steps?

Mar 12 '24 21:03 BiEchi

You don't? The max steps is an arbitrary number in the code from appearances, like the lr being set to its max value as well? Its your choice on what to put there, the config is available in files/cli?

Mar 22 '24 23:03 VatsaDev

nanoGPT nanoGPT copied to clipboard

Why do we need further pretrain given the loss is already converged

nanoGPT
nanoGPT copied to clipboard