instant-nsr-pl
instant-nsr-pl copied to clipboard
Program freezes when training
Hi, thanks for the great work.
As I was trying to train the synthesed drums data on your framework (for the first time), the program freezes like this:
I understand that some scripts would be compiled first as the code is run for the first time, but it has been like more than 2 hours still fronzen. Any advise would be very appreciated. Thanks in advance.
Could you please provide the following information:
- GPU model & CUDA version
- PyTorch version and how you install it (pip or conda)
of course. Thx for your quick reply.
My GPU is GeForce GTX 1060 (poor one), CUDA 11.3 PyTorch version is 1.12.0 with py3.9_cuda11.3_cudnn8_0, which was installed via pip if I remember correctly. @bennyguo
Can you try to repace all the FullyFusedMLP
with VanillaMLP
in the config file and see if this works? If it still hangs, press ctrl+c and check the stacktrace to find where the program stucks at.
No, it did not work, I got the very same log as the one I post. ctrl+c does not work either, which is even weird. I've also noticed that code copy operation was not excuted since the program stucked. Maybe I can add some info printing in the python script, where would you suggest for me to start with?
Maybe this is related: https://github.com/KAIR-BAIR/nerfacc/issues/70#issuecomment-1279782194
@liruilong940607 Thanks! @ZirongChan Could you try Ruilong's solution and see if it works? It still not, try to manually kill the program this time and check the stacktrace.
I met the same error,how did you fix it finally?