nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Why do we need further pretrain given the loss is already converged

Open BiEchi opened this issue 11 months ago • 1 comments

I observe that the loss converges around 100000 steps. Why do we need to further train the model until 600000 steps?

BiEchi avatar Mar 12 '24 21:03 BiEchi

You don't? The max steps is an arbitrary number in the code from appearances, like the lr being set to its max value as well? Its your choice on what to put there, the config is available in files/cli?

VatsaDev avatar Mar 22 '24 23:03 VatsaDev