LlamaGen icon indicating copy to clipboard operation
LlamaGen copied to clipboard

Training cost

Open liming-ai opened this issue 1 year ago • 3 comments
trafficstars

Thanks for the amazing work, could you open the training cost for each model? such as training GPU times and the least GPU needed.

liming-ai avatar Jun 11 '24 17:06 liming-ai

Hi~ All our experiments use 80G A100

model params total bs lr epochs GPUs training time
tokenizer 72M 128 1e-4 40 8 ~2days
LlamaGen-B 111M 256 1e-4 300 8 ~1days
LlamaGen-L 343M 256 1e-4 300 8 ~2days
LlamaGen-XL 775M 256 2e-4 300 8 x 2 ~3days
LlamaGen-XXL 1.4B 512 2e-4 300 8 x 4 ~4days
LlamaGen-3B 3.1B 512 2e-4 300 8 x 4 ~5days

PeizeSun avatar Jun 11 '24 21:06 PeizeSun

do you have numbers for the conditional generation?

isidentical avatar Jun 12 '24 02:06 isidentical

Why does it take only one day to train LlamaGen-B with 8 A100? Is there a special technique? With the same settings, it takes me 2.5 days to run 300 epochs.

GooThinker avatar Sep 04 '24 08:09 GooThinker