open_llama icon indicating copy to clipboard operation
open_llama copied to clipboard

Any plans to train for 30b model

Open mtc2013 opened this issue 2 years ago • 5 comments

Are there any plans to train a 30b replica of Llama or is the 7b enough to meet your purposes of comparison?

mtc2013 avatar May 04 '23 15:05 mtc2013

We are definitely interested in replicating 30B model but there are no concrete plans yet since currently we are focused on completing 7B model training.

haoliuhl avatar May 04 '23 17:05 haoliuhl

How much did it cost you guys so far training the 7B model?

lksysML avatar May 06 '23 08:05 lksysML

In the original Llama, there where the sizes

  • 7B with 32 Layers
  • 13B with 40 Layers
  • 30B with 60 Layers
  • 65B with 80 Layers

As we all know, there is a really BIG gap in filesizes between 13B and 30B, and again up to the 65B model. For many of us, the best possible model to run on one own's hardware is determined by how large a model fits into the hardware. I would LOVE to see a 50 Layers Model, which would possibly be around 25B params, and a 70 Layers Model, being around50B params.

Maybe after training the 7B version, it may be nice not to focus on copying exactly the same model sizes, but different ones?

maddes8cht avatar May 26 '23 13:05 maddes8cht

Any update on whether training for larger models will eventually happen? Perhaps the TPU Research Cloud could be a free source of compute for the training process, and SlimPajama could be used in place of RedPajama to further accelerate the training.

redbrain avatar Jun 27 '23 19:06 redbrain

+1 on the 33b llama. It performs much better than the 13b one.

qizzzh avatar Jul 16 '23 02:07 qizzzh