RegTR icon indicating copy to clipboard operation
RegTR copied to clipboard

A CUDA Error

Open Fzuerzmj opened this issue 2 years ago • 1 comments

Dear Yew & other friends: I have run code on (just like in readme): Python 3.8.8 PyTorch 1.9.1 with torchvision 0.10.1 (Cuda 11.1) PyTorch3D 0.6.0 MinkowskiEngine 0.5.4 RTX 3090

    But I got following error:

    recent call last):
      File "train.py", line 88, in <module>
        main()
      File "train.py", line 84, in main
        trainer.fit(model, train_loader, val_loader)
      File "/home/***/codes/RegTR-main/src/trainer.py", line 119, in fit
        losses['total'].backward()
      File "/home/***/enter/envs/regtr/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/home/***/enter/envs/regtr/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
        Variable._execution_engine.run_backward(
    RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
    


    I have already tried to set os.environ['CUDA_LAUNCH_BLOCKING'] = '1', but it did not work.

Fzuerzmj avatar Jan 06 '23 03:01 Fzuerzmj

This is likely related to the issue with Minkowski engine as noted in the other issues, e.g. #1. I do hope to resolve this by removing the Minkowski engine dependency in the future, but I don't have time/resources to do so at the moment and I apologise for that.

In the short term, you may want change the code to the non-Minkowski CPU versions.

yewzijian avatar Jan 06 '23 04:01 yewzijian