nanoGPT
nanoGPT copied to clipboard
Why do we need further pretrain given the loss is already converged
I observe that the loss converges around 100000 steps. Why do we need to further train the model until 600000 steps?
You don't? The max steps is an arbitrary number in the code from appearances, like the lr being set to its max value as well? Its your choice on what to put there, the config is available in files/cli?