transformer-models
transformer-models copied to clipboard
Add a GPT-2 training example
We would like to use these issues to gauge user interest.
It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.
To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:
- Add support for a smaller GPT-2 model.
- Only train a subset of the GPT-2 parameters.
- Use gradient accumulation.
- Gradient checkpointing.
- Reduced precision gradients.
I have received one inquiry for fine-tuning GPT-2.
I second for being able to fine-tune GPT-2.