nanoGPT Expected iteration speed for the small 125M model

Expected iteration speed for the small 125M model

Open jprobichaud opened this issue 1 year ago • 0 comments

Thanks for this project!

I'm currently training the small version on openwebtext with 8 x A100 gpus, using torch 2.0 (nigthly). The data is local to the instance and the machine isn't busy at all. GPUs are at 100% processing capacity but I'm getting about 560ms per iteration, does that sounds reasonable? Could this be a sign that something is off with torchrun or cuda?

$ torchrun --standalone --nproc_per_node=8 train.py
...
iter 1631: loss 4.0863, time 561.94ms
iter 1632: loss 3.7441, time 563.15ms
iter 1633: loss 3.9848, time 562.11ms
iter 1634: loss 3.8716, time 562.80ms
iter 1635: loss 3.9853, time 561.50ms
iter 1636: loss 3.7735, time 562.73ms
iter 1637: loss 3.8779, time 562.19ms
iter 1638: loss 3.6817, time 562.74ms
iter 1639: loss 3.9185, time 562.66ms
iter 1640: loss 3.6782, time 562.18ms
iter 1641: loss 3.9748, time 567.31ms

Feb 02 '23 15:02 jprobichaud

nanoGPT nanoGPT copied to clipboard

Expected iteration speed for the small 125M model

nanoGPT
nanoGPT copied to clipboard