RAD-NeRF Training time is too long

Hi, I found that it takes 0.07 seconds to train a step forward, but 15 seconds to backward，thus the total time will take several days. Is there something wrong with my training code！

Feb 18 '23 05:02 xingpeima

This is strange, what's the environment (e.g., GPU, CUDA, OS) you use? How do you measure the time?

Feb 18 '23 13:02 ashawkey

this is the training print info:

it will take 27 hours per epoch. the environment is V100 32G, is that normal?

Feb 20 '23 01:02 xingpeima

No... it should take less than 10 hours to finish all epochs on V100. Are you facing this slow speed problem when training other DL models (e.g., resnet)?

Feb 20 '23 05:02 ashawkey

Is there a way to speed up the training process, with only 1 GPU ?

Feb 21 '23 02:02 Erickrus

The current training speed doesn't have much space to improve I guess. Maybe you could increase the num_rays and train less steps, but this may scarifice performance. Also, you may try torch 2.0's compile.

Feb 21 '23 04:02 ashawkey

I have CUDA version 11.7, A100 40GB, and I'm having an issue where it's taking 4-5 hours per epoch. 스크린샷 2023-02-21 오후 1 15 05

Feb 21 '23 04:02 yediny

@yediny Could you provide the command you use? If you have enough GPU memory, you could try to use --preload 2 to see if the speed bottleneck is image loading.

Feb 21 '23 09:02 ashawkey

Even with preload applied.. it is still the same speed. Is there any way to make the most use of gpu memory for training? And under the normal case, does it take a day to training one model? 스크린샷 2023-02-23 오후 2 00 45

Feb 23 '23 05:02 yediny

how to use the torch.compile accelerate training?

Jun 28 '23 11:06 braintown

it seems we should use LightningModule to rewrite the nn.Module and then use the compile?

Jun 28 '23 11:06 braintown

RAD-NeRF RAD-NeRF copied to clipboard

Training time is too long

RAD-NeRF
RAD-NeRF copied to clipboard