transformer-models Add a GPT-2 training example

Add a GPT-2 training example

Open bwdGitHub opened this issue 3 years ago • 2 comments

We would like to use these issues to gauge user interest.

It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.

To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:

Add support for a smaller GPT-2 model.
Only train a subset of the GPT-2 parameters.
Use gradient accumulation.
Gradient checkpointing.
Reduced precision gradients.

Dec 10 '21 16:12 bwdGitHub

I have received one inquiry for fine-tuning GPT-2.

Mar 05 '22 07:03 misataguchi

I second for being able to fine-tune GPT-2.

Dec 19 '22 08:12 qwer1304

transformer-models transformer-models copied to clipboard

Add a GPT-2 training example

transformer-models
transformer-models copied to clipboard