nerfstudio
nerfstudio copied to clipboard
Multiple GPU performance worse than single GPU
ns-train nerfacto --machine.num-gpus 2 --vis viewer
The output indicates that training is slower with more GPUs. Using two GPUs, I see about 60% CPU utilization, and three GPUs seems to be CPU limited.
With one GPU, I see 267 K rays/second. With two, I see 250 K rays/second.
@catid My guess is that our timing logic is incorrect, where we are only reporting the rays/second for a single GPU as opposed to 2X the rays (for both GPUs). Will investigate.
It is just broken. I measured 26 seconds with 3 GPUs and 20 seconds with 1 GPU to get to 1000 steps on the poster data.
Hi @catid, just letting you know that we've started looking into this now that other miscellaneous fixes are out of the way.
Awesome! Planning to use this on a cluster of machines, each with between 1-3 GPUs