open_llama Any plans to train for 30b model

Any plans to train for 30b model

Open mtc2013 opened this issue 2 years ago • 5 comments

Are there any plans to train a 30b replica of Llama or is the 7b enough to meet your purposes of comparison?

May 04 '23 15:05 mtc2013

We are definitely interested in replicating 30B model but there are no concrete plans yet since currently we are focused on completing 7B model training.

May 04 '23 17:05 haoliuhl

How much did it cost you guys so far training the 7B model?

May 06 '23 08:05 lksysML

In the original Llama, there where the sizes

7B with 32 Layers
13B with 40 Layers
30B with 60 Layers
65B with 80 Layers

As we all know, there is a really BIG gap in filesizes between 13B and 30B, and again up to the 65B model. For many of us, the best possible model to run on one own's hardware is determined by how large a model fits into the hardware. I would LOVE to see a 50 Layers Model, which would possibly be around 25B params, and a 70 Layers Model, being around50B params.

Maybe after training the 7B version, it may be nice not to focus on copying exactly the same model sizes, but different ones?

May 26 '23 13:05 maddes8cht

Any update on whether training for larger models will eventually happen? Perhaps the TPU Research Cloud could be a free source of compute for the training process, and SlimPajama could be used in place of RedPajama to further accelerate the training.

Jun 27 '23 19:06 redbrain

+1 on the 33b llama. It performs much better than the 13b one.

Jul 16 '23 02:07 qizzzh

open_llama open_llama copied to clipboard

Any plans to train for 30b model

open_llama
open_llama copied to clipboard