deepdrr icon indicating copy to clipboard operation
deepdrr copied to clipboard

CUDA does not match blocking stream's device type CPU.

Open antal-huck opened this issue 2 years ago • 5 comments

Performing a torch backpropagation after importing deepdrr results in the following error:

Traceback (most recent call last): File "/medacta/landmark-detection/src/main.py", line 20, in loss.backward() File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/usr/local/lib/python3.8/dist-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Event device type CUDA does not match blocking stream's device type CPU.

Running in docker with

FROM nvidia/cuda:11.2.0-cudnn8-devel-ubuntu20.04 ... RUN pip install deepdrr

antal-huck avatar May 09 '22 16:05 antal-huck

Running pycuda (which deepdrr depends on) and pytorch at the same time is tricky, as these libraries don't play nicely with one another. If it is feasible, it is usually easier to save images offline for later training.

benjamindkilleen avatar May 09 '22 16:05 benjamindkilleen

Yes, this is feasible, but it would also be cool to generate training data on the fly. I receive the same error with the snipped at https://github.com/arcadelab/deepdrr#using-deepdrr-simultaneously-with-pytorch. So my question is: With which setup did this snipped work?

antal-huck avatar May 09 '22 18:05 antal-huck

Turns out I cannot import for example Volume from deepdrr or Point3D from deepdrr.geo. Such imports already lead to the error from above. I would heavily need to refactor my project code to separate deepdrr from the dataset.

antal-huck avatar May 10 '22 07:05 antal-huck

  • pycuda==2018.1.1 seems to work together with torch==1.7.1
  • needed to downgrade from torch==1.11.0 to 1.7.1, other combinations did not work
  • didn't manage to get pycuda==2021.1 to work with newer versions of torch

antal-huck avatar May 10 '22 14:05 antal-huck

Yeah, I am continuing to have issues getting torch and pycuda to play nicely together. The easiest solution so far, if you have the resources for it, is to cordon them off from one another using two GPUs. PyCUDA will by default grab GPU 0, and you can specify `model.to("cuda:1") to tell PyTorch to use the other device. If anyone has better solutions, I'd love to incorporate them.

benjamindkilleen avatar Jun 26 '22 17:06 benjamindkilleen

Adding this summary:

  • Use import pycuda.autoprimaryctx instead of import pycuda.autoinit, as this uses the existing context. This change has been made in the most recent DeepDRR releases.
  • If possible, give PyCuda its own GPU device. By default it will take device 0. You can change the "virtual order" with CUDA_VISIBLE_DEVICES=1,0. The PyTorch model can be directed to a particular device with model.to(torch.device("cuda:1")), for example.
  • If not possible, try allocating pytorch tensors, including the model with all backprop, before initializing a DeepDRR projector.
  • Consider this related thread on this issue.

benjamindkilleen avatar Feb 04 '23 13:02 benjamindkilleen

Closing this issue, as the dev branch has now dropped Pycuda in favor of Cupy across DeepDRR.

benjamindkilleen avatar Nov 14 '23 15:11 benjamindkilleen