DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Cannot access CUDA GPU on WSL

Open benchd opened this issue 1 year ago • 1 comments

Version

nvidia-dali-cuda120:1.37.1, nvidia-dali-nightly-cuda120 1.38.0.dev20240507

Describe the bug.

I've been following https://github.com/NVIDIA/DALI/issues/4663 and I'm seeing something similar but cannot figure out why. I can access my gpu on device 0 using nvidia-smi and I can access it using the same conda environment with pytorch so I'm unclear why dali cannot. This is inside a conda environment inside wsl on windows

Minimum reproducible example

Conda envionment:
name: multilabelimage_model_env
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - pytorch
  - torchvision
  - torchaudio
  - pytorch-cuda=12.1
  - opencv
  - pandas
  - scikit-learn=1.4.0
  - wandb
  - matplotlib
  - tqdm
  - pillow
  - numpy
  - scipy
  - pyyaml
  - pip
  - pip:
      - torch-summary
      - tensorboard
      - torch-tb-profiler
      - torch-geometric
      - timm

installed DALI using the official installation guide: 
pip install --extra-index-url https://pypi.nvidia.com --upgrade nvidia-dali-cuda120

Also tried with nightly build

Tested with minimal example:
`import nvidia.dali as dali
import numpy as np
@dali.pipeline_def
def my_pipe():
  return dali.fn.external_source(np.array([1,2,3], dtype=np.float32), batch=False).gpu()

pipe = my_pipe(batch_size=1, num_threads=1, device_id=1)
pipe.build()
print(pipe.run())
`

Relevant log output

Minimal example above gets error:

python dali_test.py
/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/nvidia/dali/backend.py:99: Warning: nvidia-dali-cuda120 is no longer shipped with CUDA runtime. You need to install it separately. cuFFT is typically provided with CUDA Toolkit installation or an appropriate wheel. Please check https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html#pip-wheels-installation-linux for the reference.
  deprecation_warning(
/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/nvidia/dali/backend.py:110: Warning: nvidia-dali-cuda120 is no longer shipped with CUDA runtime. You need to install it separately. NPP is typically provided with CUDA Toolkit installation or an appropriate wheel. Please check https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html#pip-wheels-installation-linux for the reference.
  deprecation_warning(
/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/nvidia/dali/backend.py:121: Warning: nvidia-dali-cuda120 is no longer shipped with CUDA runtime. You need to install it separately. nvJPEG is typically provided with CUDA Toolkit installation or an appropriate wheel. Please check https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html#pip-wheels-installation-linux for the reference.
  deprecation_warning(
Traceback (most recent call last):
  File "/mnt/c/Coding/Testing/PyTorch/MultiLabelClassification_Patreon/actual_real_user_code/dali_test.py", line 8, in <module>
    pipe.build()
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/nvidia/dali/pipeline.py", line 979, in build
    self._init_pipeline_backend()
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/nvidia/dali/pipeline.py", line 813, in _init_pipeline_backend
    self._pipe = b.Pipeline(
                 ^^^^^^^^^^^
RuntimeError: CUDA runtime API error cudaErrorInvalidDevice (101):
invalid device ordinal

Other/Misc.

Found similar issues but could not find a solution

Check for duplicates

  • [X] I have searched the open bugs/issues and have found no duplicates for this bug report

benchd avatar May 11 '24 01:05 benchd

Hello @benchd, Please check your device id. You said you can access "device 0", but your DALI snippet specifies device 1.

pipe = my_pipe(batch_size=1, num_threads=1, device_id=1)
                                            ^^^^^^^^^^^

mzient avatar May 13 '24 07:05 mzient