LlamaGen
LlamaGen copied to clipboard
Training cost
trafficstars
Thanks for the amazing work, could you open the training cost for each model? such as training GPU times and the least GPU needed.
Hi~ All our experiments use 80G A100
| model | params | total bs | lr | epochs | GPUs | training time |
|---|---|---|---|---|---|---|
| tokenizer | 72M | 128 | 1e-4 | 40 | 8 | ~2days |
| LlamaGen-B | 111M | 256 | 1e-4 | 300 | 8 | ~1days |
| LlamaGen-L | 343M | 256 | 1e-4 | 300 | 8 | ~2days |
| LlamaGen-XL | 775M | 256 | 2e-4 | 300 | 8 x 2 | ~3days |
| LlamaGen-XXL | 1.4B | 512 | 2e-4 | 300 | 8 x 4 | ~4days |
| LlamaGen-3B | 3.1B | 512 | 2e-4 | 300 | 8 x 4 | ~5days |
do you have numbers for the conditional generation?
Why does it take only one day to train LlamaGen-B with 8 A100? Is there a special technique? With the same settings, it takes me 2.5 days to run 300 epochs.