nanoGPT
nanoGPT copied to clipboard
copy model args from checkpint model when resuming the training
Not sure what's on your mind on how passed-in params should interact with checkpoint params, however, it seems logical to copy the checkpoint params to model params when resuming the training.
Adding some context about why I came across the need for making this change:
I finetuned the model gpt2-xl on some data for 1000 iterations and checkpoints were saved. I used a finetune config similar to https://github.com/karpathy/nanoGPT/blob/master/config/finetune_shakespeare.py init_from was set to 'gpt2-xl'.
For a second run, I tried to resume the training, (init_from = 'resume'), but the model params defaulted to the model architecture params (which are set to 'gpt2' model args) in the train.py and failed the assert the test to match the checkpoint args with these model args.
So instead of now adding all the args to my configs, I thought we can just copy the checkpoint args. However, we can also just copy the model architecture args, so that we can change the args like dropout when we resume the training.
@karpathy - any thoughts?