instant-nsr-pl Program freezes when training

Hi, thanks for the great work.

As I was trying to train the synthesed drums data on your framework (for the first time), the program freezes like this: 20221108163936

I understand that some scripts would be compiled first as the code is run for the first time, but it has been like more than 2 hours still fronzen. Any advise would be very appreciated. Thanks in advance.

Nov 08 '22 08:11 ZirongChan

Could you please provide the following information:

GPU model & CUDA version
PyTorch version and how you install it (pip or conda)

Nov 08 '22 09:11 bennyguo

of course. Thx for your quick reply.

My GPU is GeForce GTX 1060 (poor one), CUDA 11.3 PyTorch version is 1.12.0 with py3.9_cuda11.3_cudnn8_0, which was installed via pip if I remember correctly. @bennyguo

Nov 08 '22 09:11 ZirongChan

Can you try to repace all the FullyFusedMLP with VanillaMLP in the config file and see if this works? If it still hangs, press ctrl+c and check the stacktrace to find where the program stucks at.

Nov 08 '22 12:11 bennyguo

No, it did not work, I got the very same log as the one I post. ctrl+c does not work either, which is even weird. I've also noticed that code copy operation was not excuted since the program stucked. Maybe I can add some info printing in the python script, where would you suggest for me to start with?

Nov 09 '22 02:11 ZirongChan

Maybe this is related: https://github.com/KAIR-BAIR/nerfacc/issues/70#issuecomment-1279782194

Nov 09 '22 03:11 liruilong940607

@liruilong940607 Thanks! @ZirongChan Could you try Ruilong's solution and see if it works? It still not, try to manually kill the program this time and check the stacktrace.

Nov 09 '22 07:11 bennyguo

I met the same error,how did you fix it finally?

Jul 11 '24 08:07 badarrrr

instant-nsr-pl instant-nsr-pl copied to clipboard

Program freezes when training

instant-nsr-pl
instant-nsr-pl copied to clipboard