OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

Question about the tokens/per second/GPU

Open P3ngLiu opened this issue 1 year ago • 1 comments

❓ The question

Thanks for your great work. I have a question about the tokens/second/GPU when training the 7B model. I checked your TRAINLOG.md, it saids the tokens/second/GPU for 7B is 1200. But when I check the wandb logs, the tokens/second/GPU for 7B model is about 2500. How did you get this dramatic improvements?

P3ngLiu avatar Mar 24 '24 18:03 P3ngLiu

The difference in the tokens per second is due to the hardware and the restart behavior during training. The model was trained twice, once on AMD GPUs and once on NVIDIA GPUs. The wandb logs showing 2500 tokens/second are from the training on NVIDIA GPUs and AMD GPUs, which are more optimized for this workload. On AMD GPUs, the performance is split into two sessions (s2) when the training process is restarted, which can affect the reported speed in some cases.

This improvement in tokens per second is primarily a result of the optimized performance of NVIDIA and AMD GPUs, and how the logging system (wandb vs. TRAINLOG.md) accounts for the training sessions.

aman-17 avatar Oct 19 '24 23:10 aman-17

Hi, thanks again for the inquiry! We’re currently working on closing out old tickets, so we’re closing this out for now, but if you require a follow-up response, please re-open and we will get back to you!

baileykuehl avatar Jul 01 '25 17:07 baileykuehl