transformer-models icon indicating copy to clipboard operation
transformer-models copied to clipboard

Add a GPT-2 training example

Open bwdGitHub opened this issue 3 years ago • 2 comments

We would like to use these issues to gauge user interest.

It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.

To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:

  1. Add support for a smaller GPT-2 model.
  2. Only train a subset of the GPT-2 parameters.
  3. Use gradient accumulation.
  4. Gradient checkpointing.
  5. Reduced precision gradients.

bwdGitHub avatar Dec 10 '21 16:12 bwdGitHub

I have received one inquiry for fine-tuning GPT-2.

misataguchi avatar Mar 05 '22 07:03 misataguchi

I second for being able to fine-tune GPT-2.

qwer1304 avatar Dec 19 '22 08:12 qwer1304