aitextgen icon indicating copy to clipboard operation
aitextgen copied to clipboard

Converting GPT-NEO's Colab model for use with aitextgen

Open swcrazyfan opened this issue 4 years ago • 4 comments

Currently, aitextgen does not work to train anything larger than the 125M model with GPT-NEO (the 350M model is no longer available).

However, the official GPT-NEO Colab notebook uses TPU and I successfully trained the 1.3B model! It seems the 2.7B model could also work.

Is there any way to convert the model for use with aitextgen? I've tried to find information in the HuggingFace docs, but I cannot find a straightforward approach.

Thank you!

swcrazyfan avatar May 20 '21 04:05 swcrazyfan

Good news! With a point in the right direction from someone on the Eleuthera discord, I was able to convert my model and it worked flawlessly to generate text with aitextgen!

I'm a total noob at this stuff, but I can share the Colab notebook I made if it's useful as reference for anyone.

swcrazyfan avatar May 20 '21 13:05 swcrazyfan

I'd be curious to see the notebook. I have not tested as much with the 1.3B/2.7B GPTNeo models.

minimaxir avatar May 21 '21 02:05 minimaxir

Firstly, I used the Colab notebook from Eleuther's github Readme.

Afterwards, I copied the checkpoints into my own colab and converted it. You can check out my colab here: https://colab.research.google.com/drive/16Mg3bc42VSni7hTJhauJBjg3kZDkgorx?usp=sharing

swcrazyfan avatar May 27 '21 05:05 swcrazyfan

I have done this successfully using this script: https://github.com/huggingface/transformers/blob/master/src/transformers/models/gpt_neo/convert_gpt_neo_mesh_tf_to_pytorch.py

redthing1 avatar Aug 27 '21 20:08 redthing1