RegTR
RegTR copied to clipboard
A CUDA Error
Dear Yew & other friends: I have run code on (just like in readme): Python 3.8.8 PyTorch 1.9.1 with torchvision 0.10.1 (Cuda 11.1) PyTorch3D 0.6.0 MinkowskiEngine 0.5.4 RTX 3090
But I got following error:
recent call last):
File "train.py", line 88, in <module>
main()
File "train.py", line 84, in main
trainer.fit(model, train_loader, val_loader)
File "/home/***/codes/RegTR-main/src/trainer.py", line 119, in fit
losses['total'].backward()
File "/home/***/enter/envs/regtr/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/***/enter/envs/regtr/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
Variable._execution_engine.run_backward(
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
I have already tried to set os.environ['CUDA_LAUNCH_BLOCKING'] = '1', but it did not work.
This is likely related to the issue with Minkowski engine as noted in the other issues, e.g. #1. I do hope to resolve this by removing the Minkowski engine dependency in the future, but I don't have time/resources to do so at the moment and I apologise for that.
In the short term, you may want change the code to the non-Minkowski CPU versions.