GPV_Pose
GPV_Pose copied to clipboard
RuntimeError: Function 'DotBackward0' returned nan values in its 0th output.
Hi~ Thank you for releasing the code. When I run the training code, the loss will appear Nan after several epochs. I have tried three times and encountered the same problem. I did not modify any parameters. Can you give me some advice?
Which version of code are you using? The main branch or the shape prior one?
I encountered this error before. There are two potential reasons:
- The sampled point cloud has no point since the ground truth mask is errorneous.
- The bounding box voting process includes computing the inverse matrix. A non-full rank matrix yields the error. You might need to check the loss value before backward the loss and mask out NaN.
Which version of code are you using? The main branch or the shape prior one?
I use the main branch
Thank you for your advice. I will try it
@Bingo-1996 can you post how you solved this? or have you solved it? :) thanks in advance
@lolrudy and @Bingo-1996 like that?
# backward
shape = total_loss.shape
total_loss = total_loss.reshape(shape[0], -1)
# Drop all rows containing any nan:
total_loss = total_loss[~torch.any(total_loss.isnan(), dim=1)]
# Reshape back:
total_loss = total_loss.reshape(total_loss.shape[0], *shape[1:])
@HannahHaensen I haven't solved this problem, and there are other problems when I use the shape prior branch. Did you solve the problem?
Yes, this should work. Or you can set all nan values to 0.
@Bingo-1996 not sure yet this error occured for me after ~30 epochs not there again yet but if the training passes I can confirm or decline :) and i am on the main branch not the shape prior
@lolrudy thanks!