gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

Update documentation

Open ayl opened this issue 3 years ago • 3 comments

First of all, thank you for this repo. It allowed me to start training a GPT model from random initialization.

A couple of things that I have noticed:

  • The README.md is out of date in that the option "--keep-newlines" is not available on the preprocessing script.
  • The documentation refers to finetuning a few times but there is no mention of how to perform finetuning from GPT-Neo or GPT-J.
  • Some additional documentation about how the different size models in config/ relate to the different GPT-3 sizes or GPT-Neo/J would be helpful.
  • the datapath for the configs in the README.md do not have the "_text_document" that is appended by the preprocessing script.

ayl avatar Feb 07 '22 02:02 ayl

The README.md is out of date in that the option "--keep-newlines" is not available on the preprocessing script.

Thank you for pointing this out.

The documentation refers to finetuning a few times but there is no mention of how to perform finetuning from GPT-Neo or GPT-J.

Finetuning is mechanistically the same as training, just called train.py.

Some additional documentation about how the different size models in config/ relate to the different GPT-3 sizes or GPT-Neo/J would be helpful.

We use the same naming convention as GPT-3. If you want to train the model referred to as GPT-3 Large in the paper you use configs/large.yml. To train a model the same size as GPT-Neo, select the file configs/2-7b.yml as GPT-Neo is a 2.7B parameter model.

the datapath for the configs in the README.md do not have the "_text_document" that is appended by the preprocessing script.

Good idea, this should be explicitly discussed.

StellaAthena avatar Feb 10 '22 14:02 StellaAthena

@StellaAthena If no one is working on it. Then I would love to work on it.

divyanshugit avatar May 13 '22 07:05 divyanshugit

@StellaAthena If no one is working on it. Then I would love to work on it.

@divyanshugit If these changes haven't already been merged then yes, nobody is working on it. Feel free to pick it up :)

StellaAthena avatar May 13 '22 22:05 StellaAthena