LlamaGen Training cost

Training cost

Open liming-ai opened this issue 1 year ago • 3 comments

trafficstars

Thanks for the amazing work, could you open the training cost for each model? such as training GPU times and the least GPU needed.

Jun 11 '24 17:06 liming-ai

Hi~ All our experiments use 80G A100

model	params	total bs	lr	epochs	GPUs	training time
tokenizer	72M	128	1e-4	40	8	~2days
LlamaGen-B	111M	256	1e-4	300	8	~1days
LlamaGen-L	343M	256	1e-4	300	8	~2days
LlamaGen-XL	775M	256	2e-4	300	8 x 2	~3days
LlamaGen-XXL	1.4B	512	2e-4	300	8 x 4	~4days
LlamaGen-3B	3.1B	512	2e-4	300	8 x 4	~5days

Jun 11 '24 21:06 PeizeSun

do you have numbers for the conditional generation?

Jun 12 '24 02:06 isidentical

Why does it take only one day to train LlamaGen-B with 8 A100? Is there a special technique? With the same settings, it takes me 2.5 days to run 300 epochs.

Sep 04 '24 08:09 GooThinker

LlamaGen LlamaGen copied to clipboard

Training cost

LlamaGen
LlamaGen copied to clipboard