SASA
SASA copied to clipboard
One of the variables needed for gradient computation has been modified by an inplace operation:
Hi, I have problems running the code. I installed all the prerequisites successfully but when I try to run the model with single gpu, it returns error and I don't know why.
Traceback (most recent call last):
File "train.py", line 201, in <module>
main()
File "train.py", line 173, in main
merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch
File "/mnt/ssd/3ddet/SASA/tools/train_utils/train_utils.py", line 94, in train_model
dataloader_iter=dataloader_iter
File "/mnt/ssd/3ddet/SASA/tools/train_utils/train_utils.py", line 41, in train_one_epoch
loss.backward()
File "/home/user/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/user/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 1024, 256, 64]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
I also tried to run with torch.autograd.set_detect_anomaly(True)
set, but still returns similar errors.
How should I solve the problem?
Thanks
I have the same problem with PyTorch 1.10. It seems like version issue, because higher version of PyTorch no longer supports in-place modification on tensor. I haven't found a solution.
Hi, I have solved this problem.
Higher version of PyTorch doesn't support in-place operation on Tensor, which means all the relu function should be set to inplace=False, and operation like '+=', '-=' should not be applied in forward process.
So I change the original code in line206 SASA/pcdet/ops/pointnet2/pointnet2_batch/pointnet2_modules.py:
new_features *= idx_cnt_mask
to
new_features_clone = new_features.clone()
new_features = new_features_clone * idx_cnt_mask
And everything goes well.
Regards.