aitextgen GPT-Neo/GPT-3 Support

Huggingface is adding PyTorch-based GPT-Neo support via https://github.com/huggingface/transformers/pull/10848

That's just the superlarge models (1.3B and 2.7B). If performance/support is good (since this is the only practical way to get a GPT-3 analogous architecture), I am open to doing the necessary work to add it to aitextgen. (it shouldn't be too much though since the defaults between GPT-2 and GPT-Neo are similar, but will have to add some config metadata)

Mar 28 '21 03:03 minimaxir

Also depends on DeepSpeed and ONNX support, which won't be automatic.

Mar 28 '21 21:03 minimaxir

Since that PR is now merged and there's already blog posts talking about finetuning on GPT Neo, I supposed I'll have to add it at some point.

The 1.5B Neo might be fussy; Ideally someone will train a smaller GPT-Neo for testing.

Apr 05 '21 04:04 minimaxir

Now in released Transformers so can test.

There is a released 125M model comparable to GPT-2's 124M model. Will test if finetuning works out of the box. (it should)

Apr 06 '21 16:04 minimaxir

Due to me being stupid, I hardcoded a lot of GPT2LMHeadModel which unfortunately causes this to not work out of the box.

I probably need to go back to AutoConfig so transformers can infer the correct model.

Apr 10 '21 22:04 minimaxir

just curious whether this still support to train GPT-Neo from scratch? like GPT2 in aitextgen does. specifically can it be trained on a nvidia GPU with 8G memory (like 3060Ti)

Apr 25 '21 23:04 lvxiaoc

So it appears there's a slightly increased memory overhead for training GPT Neo (could also be a function of that it's new and less optimized)

When finetuning the 125M model in Colab it hit about 10GB VRAM so that may not work well on a 8GB VRAM GPU. (although a 3060Ti should support fp16 so it might work with that)

Apr 26 '21 16:04 minimaxir

aitextgen aitextgen copied to clipboard

GPT-Neo/GPT-3 Support

aitextgen
aitextgen copied to clipboard