aitextgen
aitextgen copied to clipboard
Converting GPT-NEO's Colab model for use with aitextgen
Currently, aitextgen does not work to train anything larger than the 125M model with GPT-NEO (the 350M model is no longer available).
However, the official GPT-NEO Colab notebook uses TPU and I successfully trained the 1.3B model! It seems the 2.7B model could also work.
Is there any way to convert the model for use with aitextgen? I've tried to find information in the HuggingFace docs, but I cannot find a straightforward approach.
Thank you!
Good news! With a point in the right direction from someone on the Eleuthera discord, I was able to convert my model and it worked flawlessly to generate text with aitextgen!
I'm a total noob at this stuff, but I can share the Colab notebook I made if it's useful as reference for anyone.
I'd be curious to see the notebook. I have not tested as much with the 1.3B/2.7B GPTNeo models.
Firstly, I used the Colab notebook from Eleuther's github Readme.
Afterwards, I copied the checkpoints into my own colab and converted it. You can check out my colab here: https://colab.research.google.com/drive/16Mg3bc42VSni7hTJhauJBjg3kZDkgorx?usp=sharing
I have done this successfully using this script: https://github.com/huggingface/transformers/blob/master/src/transformers/models/gpt_neo/convert_gpt_neo_mesh_tf_to_pytorch.py