instant-nsr-pl icon indicating copy to clipboard operation
instant-nsr-pl copied to clipboard

Program freezes when training

Open ZirongChan opened this issue 2 years ago • 7 comments

Hi, thanks for the great work.

As I was trying to train the synthesed drums data on your framework (for the first time), the program freezes like this: 20221108163936

I understand that some scripts would be compiled first as the code is run for the first time, but it has been like more than 2 hours still fronzen. Any advise would be very appreciated. Thanks in advance.

ZirongChan avatar Nov 08 '22 08:11 ZirongChan

Could you please provide the following information:

  • GPU model & CUDA version
  • PyTorch version and how you install it (pip or conda)

bennyguo avatar Nov 08 '22 09:11 bennyguo

of course. Thx for your quick reply.

My GPU is GeForce GTX 1060 (poor one), CUDA 11.3 PyTorch version is 1.12.0 with py3.9_cuda11.3_cudnn8_0, which was installed via pip if I remember correctly. @bennyguo

ZirongChan avatar Nov 08 '22 09:11 ZirongChan

Can you try to repace all the FullyFusedMLP with VanillaMLP in the config file and see if this works? If it still hangs, press ctrl+c and check the stacktrace to find where the program stucks at.

bennyguo avatar Nov 08 '22 12:11 bennyguo

No, it did not work, I got the very same log as the one I post. ctrl+c does not work either, which is even weird. I've also noticed that code copy operation was not excuted since the program stucked. Maybe I can add some info printing in the python script, where would you suggest for me to start with?

ZirongChan avatar Nov 09 '22 02:11 ZirongChan

Maybe this is related: https://github.com/KAIR-BAIR/nerfacc/issues/70#issuecomment-1279782194

liruilong940607 avatar Nov 09 '22 03:11 liruilong940607

@liruilong940607 Thanks! @ZirongChan Could you try Ruilong's solution and see if it works? It still not, try to manually kill the program this time and check the stacktrace.

bennyguo avatar Nov 09 '22 07:11 bennyguo

I met the same error,how did you fix it finally?

badarrrr avatar Jul 11 '24 08:07 badarrrr