RAD-NeRF icon indicating copy to clipboard operation
RAD-NeRF copied to clipboard

Training time is too long

Open xingpeima opened this issue 2 years ago • 10 comments

Hi, I found that it takes 0.07 seconds to train a step forward, but 15 seconds to backward,thus the total time will take several days. Is there something wrong with my training code!

xingpeima avatar Feb 18 '23 05:02 xingpeima

This is strange, what's the environment (e.g., GPU, CUDA, OS) you use? How do you measure the time?

ashawkey avatar Feb 18 '23 13:02 ashawkey

this is the training print info: image

it will take 27 hours per epoch. the environment is V100 32G, is that normal?

xingpeima avatar Feb 20 '23 01:02 xingpeima

No... it should take less than 10 hours to finish all epochs on V100. Are you facing this slow speed problem when training other DL models (e.g., resnet)?

ashawkey avatar Feb 20 '23 05:02 ashawkey

Is there a way to speed up the training process, with only 1 GPU ?

Erickrus avatar Feb 21 '23 02:02 Erickrus

The current training speed doesn't have much space to improve I guess. Maybe you could increase the num_rays and train less steps, but this may scarifice performance. Also, you may try torch 2.0's compile.

ashawkey avatar Feb 21 '23 04:02 ashawkey

I have CUDA version 11.7, A100 40GB, and I'm having an issue where it's taking 4-5 hours per epoch. 스크린샷 2023-02-21 오후 1 15 05 스크린샷 2023-02-21 오후 1 13 59

yediny avatar Feb 21 '23 04:02 yediny

@yediny Could you provide the command you use? If you have enough GPU memory, you could try to use --preload 2 to see if the speed bottleneck is image loading.

ashawkey avatar Feb 21 '23 09:02 ashawkey

Even with preload applied.. it is still the same speed. Is there any way to make the most use of gpu memory for training? And under the normal case, does it take a day to training one model? 스크린샷 2023-02-23 오후 2 00 45

yediny avatar Feb 23 '23 05:02 yediny

how to use the torch.compile accelerate training?

braintown avatar Jun 28 '23 11:06 braintown

it seems we should use LightningModule to rewrite the nn.Module and then use the compile?

braintown avatar Jun 28 '23 11:06 braintown