gpt-2 Training from scratch?

Training from scratch?

Open bkj opened this issue 6 years ago • 2 comments

I see that you provide code for finetuning the pretrained models -- do you think that this code is also appropriate for training a model from scratch? Or are there other repos that you think would be more appropriate for from-scratch training?

Thanks!

Jun 13 '19 17:06 bkj

I don't see why not. Well, I suppose it depends exactly what you mean by "from scratch". For learning a many tasks I would probably start with the released GPT-2 anyway, and "fine tune" it to a completely different task (like generating C code), because if nothing else the released model works as a good initialization, with the correct scales for all parameters.

If you want to do something like use a different embedding/encoding for a different kind of data (like generating a vocabulary specifically for C code rather than english prose), that would certainly be possible too, though I haven't added anything yet to specifically support that.

If you want to not use the released model at all, for instance because you want to train a model with incompatible hyperparameters, it should be sufficient to just skip the restore from the released model checkpoint (around train.py:164-177) on your first run so the parameters will all be randomly initialized.

Jun 16 '19 17:06 nshepperd

@nshepperd I see the code generation is mentioned here. Did you try GPT2 on the code generation task? Thanks!

Jul 23 '19 02:07 carter54

gpt-2 gpt-2 copied to clipboard

Training from scratch?

gpt-2
gpt-2 copied to clipboard