gpt-3 Model release

Great work by the OpenAI team! The paper does not discuss it, so I'll be the first to ask:

What's the release plan for the model definition & weights? Will it be tiered by size, like GPT-2?

May 29 '20 01:05 JulianSlzr

Yep! Please respond!

May 29 '20 01:05 Devetec

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

The size of the 1.5B GPT-2 model was about 6GB on disk, which would imply that the 175B model is at least 700GB!

May 29 '20 02:05 minimaxir

I think it’s safe to say I won’t be replicating this one anytime soon

May 29 '20 02:05 vanyacohen

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

Sure it is. Artifacts larger than 700GB are distributed all the time. I distribute Danbooru2019 via BitTorrent & rsync and that's like 3300GB! I would not advise distributing GPT-3 via GCP/AWS buckets, to say the least, but it would be easy and cheap ($30/month) to use a dedicated server to seed a GPT-3 torrent, for example.

May 29 '20 02:05 gwern

Not to detract from the difficulties of distributing the model, but the paper notes that training is performed in full half-precision, which would put the number of parameters at around 350GB.

May 29 '20 04:05 parasj

We need distilGPT-3!

May 29 '20 06:05 Grandiferr

By comparison Nvidia Megatron 11B, trained by Facebook AI in fairseq is provided as 19GB tar gz file hosted on their server farm:

https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz

May 29 '20 06:05 loretoparisi

dang it. It is here finally.

May 29 '20 08:05 theneuronprogrammer

We need distilGPT-3!

maybe we need evaporation-GPT-3

May 29 '20 14:05 nlp4whp

Most of us can hardly dream of using the full model. You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Training with the Adam optimizer (as they mention) would require at least 3 times as many (~66 GPUs), plus extra space for the activations. There are more memory-efficient optimizers though.

But there are 8 models in the paper, 4 of which are smaller than GPT-2, so some of those will probably be useful if OpenAI chooses to release them. 🙂

May 29 '20 14:05 AdamDanielKing

The FP16 point is good; that would mean the smaller models noted above would be even smaller than usual, which is good for everyone!

That may limit the supported hardware unless a way to cast up to FP32 is added. (likely something PyTorch can do)

May 29 '20 16:05 minimaxir

Fine-tuning for normal people is out of the question due to model size. Shouldn't inference still be possible if weights are loaded and applied incrementally? Especially if system rather than GPU memory is used for intermediate computations.

May 29 '20 19:05 poset

Big gap between 13B and 175B; there's probably some sweet spots for a few folks in there if something could be made available.

May 29 '20 20:05 fredbuhl

Fine-tuning for normal people is out of the question due to model size. Shouldn't inference still be possible if weights are loaded and applied incrementally? Especially if system rather than GPU memory is used for intermediate computations.

Technically you could do that, but it would be impractically slow. You'd still need at least 350 GB of RAM (some cloud instances have this) or you'd be waiting for disk -> RAM transfers of 350 GB for each token generated. For a 600 MB/s SSD that would take 10 minutes and cap the output speed at 6 tokens per hour.

With at least 350 GB of RAM the bottleneck would be RAM -> GPU transfers. If the speed is 2.3 GB/s that would take 2.5 minutes. So that caps the possible inference speed at 24 tokens per hour, or somewhere around 50 characters.

Edit: It might be faster to run fully on CPUs using >350 GB RAM than to transfer to the GPU for every token.

May 29 '20 20:05 AdamDanielKing

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

The size of the 1.5B GPT-2 model was about 6GB on disk, which would imply that the 175B model is at least 700GB!

Still lower then recent Call of Duty games so.

May 30 '20 10:05 ugurkanates

Gosh, I would really like to see something put together here to give people more access to this and tool around with it like GPT-2.

If openAI could release a cloud platform, I would gladly pay-to-play as I have disagreed with devs in the past on GPT release format. I think building a container system for language models could be the key to OpenAI making money they can reappropriate to research and also being fair to developers.

I really don’t think there is any danger in language models

Jun 02 '20 03:06 4R7I5T

gpt-3 gpt-3 copied to clipboard

Model release

gpt-3
gpt-3 copied to clipboard