gpt-neox
gpt-neox copied to clipboard
Update documentation
First of all, thank you for this repo. It allowed me to start training a GPT model from random initialization.
A couple of things that I have noticed:
- The README.md is out of date in that the option "--keep-newlines" is not available on the preprocessing script.
- The documentation refers to finetuning a few times but there is no mention of how to perform finetuning from GPT-Neo or GPT-J.
- Some additional documentation about how the different size models in config/ relate to the different GPT-3 sizes or GPT-Neo/J would be helpful.
- the datapath for the configs in the README.md do not have the "_text_document" that is appended by the preprocessing script.
The README.md is out of date in that the option "--keep-newlines" is not available on the preprocessing script.
Thank you for pointing this out.
The documentation refers to finetuning a few times but there is no mention of how to perform finetuning from GPT-Neo or GPT-J.
Finetuning is mechanistically the same as training, just called train.py
.
Some additional documentation about how the different size models in config/ relate to the different GPT-3 sizes or GPT-Neo/J would be helpful.
We use the same naming convention as GPT-3. If you want to train the model referred to as GPT-3 Large
in the paper you use configs/large.yml
. To train a model the same size as GPT-Neo, select the file configs/2-7b.yml
as GPT-Neo is a 2.7B parameter model.
the datapath for the configs in the README.md do not have the "_text_document" that is appended by the preprocessing script.
Good idea, this should be explicitly discussed.
@StellaAthena If no one is working on it. Then I would love to work on it.
@StellaAthena If no one is working on it. Then I would love to work on it.
@divyanshugit If these changes haven't already been merged then yes, nobody is working on it. Feel free to pick it up :)