nerfstudio icon indicating copy to clipboard operation
nerfstudio copied to clipboard

Multiple GPU performance worse than single GPU

Open catid opened this issue 3 years ago • 2 comments

ns-train nerfacto --machine.num-gpus 2 --vis viewer

The output indicates that training is slower with more GPUs. Using two GPUs, I see about 60% CPU utilization, and three GPUs seems to be CPU limited.

With one GPU, I see 267 K rays/second. With two, I see 250 K rays/second.

catid avatar Oct 06 '22 01:10 catid

@catid My guess is that our timing logic is incorrect, where we are only reporting the rays/second for a single GPU as opposed to 2X the rays (for both GPUs). Will investigate.

ethanweber avatar Oct 07 '22 22:10 ethanweber

It is just broken. I measured 26 seconds with 3 GPUs and 20 seconds with 1 GPU to get to 1000 steps on the poster data.

catid avatar Oct 11 '22 08:10 catid

Hi @catid, just letting you know that we've started looking into this now that other miscellaneous fixes are out of the way.

ethanweber avatar Oct 18 '22 07:10 ethanweber

Awesome! Planning to use this on a cluster of machines, each with between 1-3 GPUs

catid avatar Oct 18 '22 23:10 catid