nanoGPT copy model args from checkpint model when resuming the training

copy model args from checkpint model when resuming the training

Open yogi-miraje opened this issue 2 years ago • 0 comments

Not sure what's on your mind on how passed-in params should interact with checkpoint params, however, it seems logical to copy the checkpoint params to model params when resuming the training.

Adding some context about why I came across the need for making this change:

I finetuned the model gpt2-xl on some data for 1000 iterations and checkpoints were saved. I used a finetune config similar to https://github.com/karpathy/nanoGPT/blob/master/config/finetune_shakespeare.py init_from was set to 'gpt2-xl'.

For a second run, I tried to resume the training, (init_from = 'resume'), but the model params defaulted to the model architecture params (which are set to 'gpt2' model args) in the train.py and failed the assert the test to match the checkpoint args with these model args.

So instead of now adding all the args to my configs, I thought we can just copy the checkpoint args. However, we can also just copy the model architecture args, so that we can change the args like dropout when we resume the training.

@karpathy - any thoughts?

Jan 11 '23 22:01 yogi-miraje

nanoGPT nanoGPT copied to clipboard

copy model args from checkpint model when resuming the training

nanoGPT
nanoGPT copied to clipboard