Question about the tokens/per second/GPU
❓ The question
Thanks for your great work. I have a question about the tokens/second/GPU when training the 7B model. I checked your TRAINLOG.md, it saids the tokens/second/GPU for 7B is 1200. But when I check the wandb logs, the tokens/second/GPU for 7B model is about 2500. How did you get this dramatic improvements?
The difference in the tokens per second is due to the hardware and the restart behavior during training. The model was trained twice, once on AMD GPUs and once on NVIDIA GPUs. The wandb logs showing 2500 tokens/second are from the training on NVIDIA GPUs and AMD GPUs, which are more optimized for this workload. On AMD GPUs, the performance is split into two sessions (s2) when the training process is restarted, which can affect the reported speed in some cases.
This improvement in tokens per second is primarily a result of the optimized performance of NVIDIA and AMD GPUs, and how the logging system (wandb vs. TRAINLOG.md) accounts for the training sessions.
Hi, thanks again for the inquiry! We’re currently working on closing out old tickets, so we’re closing this out for now, but if you require a follow-up response, please re-open and we will get back to you!