DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Issue with DALI Video Reader - CUDA_ERROR_NO_DEVICE

Open kikiyana opened this issue 6 months ago • 8 comments

Describe the question.

CUDA driver API error CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected Current pipeline object is no longer valid.

I encounter an issue while trying to use DALI to read video frames, and I would greatly appreciate your help in solving it. The specific error I received is:

CUDA driver API error: CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected.

However, my GPU (NVIDIA A800-SXM4-80GB) works fine when running the image loading example provided by DALI, which suggests that the CUDA environment itself is functioning properly. I'm unsure why the device cannot be detected only when processing video data.

Here is my current setup:

  • OS: Linux x86_64 GNU/Linux
  • CUDA version: 11.7
  • DALI version: nvidia-dali-cuda110 == 1.48.0
  • GPU: NVIDIA A800-SXM4-80GB

Please let me know if you need more logs or code snippets to reproduce the issue. Thank you very much for your time and support! Best regards!

nvidia-smi

Sun Jun 8 02:48:13 2025
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========+====+====| | 0 NVIDIA A800-SXM4-80GB On | 00000000:D0:00.0 Off | 0 | | N/A 36C P0 65W / 500W | 2MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA A800-SXM4-80GB On | 00000000:D4:00.0 Off | 0 | | N/A 36C P0 63W / 500W | 2MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |===================| | No running processes found | +---------------------------------------------------------------------------------------+

install

for CUDA 11.0: 'pip install --extra-index-url https://pypi.nvidia.com --upgrade nvidia-dali-cuda110'

Error

DALI版本: 1.48.0 [/opt/dali/dali/operators/video/legacy/reader/video_loader.h:179] file_list_include_preceding_frame uses the default value False. In future releases, the default value will be changed to True. /opt/dali/dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc:197: Unable to decode file trailer.mp4 Traceback (most recent call last): File "test.py", line 113, in sequences = pipe.run() File "/root/.local/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 1419, in run return self.outputs(cuda_stream) File "/root/.local/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 1230, in outputs return self._outputs(cuda_stream) File "/root/.local/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 1337, in _outputs return self._pipe.Outputs(types._raw_cuda_stream_ptr(cuda_stream)) RuntimeError: Critical error in pipeline: Error in GPU operator nvidia.dali.fn.readers.video, which was used in the pipeline definition with the following traceback: File "test.py", line 98, in video_pipeline return fn.readers.video( encountered: CUDA driver API error CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected Current pipeline object is no longer valid.

Code:

`# 测试读取图片:成功。 from nvidia.dali.pipeline import Pipeline from nvidia.dali import pipeline_def import nvidia.dali.fn as fn import nvidia.dali.types as types

image_dir = "./images/" max_batch_size = 8

@pipeline_def def random_rotated_gpu_pipeline(): jpegs, labels = fn.readers.file( file_root=image_dir, random_shuffle=True, initial_fill=21,file_filters=[".jpg", ".png"] ) images = fn.decoders.image(jpegs, device="cpu") angle = fn.random.uniform(range=(-10.0, 10.0)) rotated_images = fn.rotate(images.gpu(), angle=angle, fill_value=0) return rotated_images, labels

pipe = random_rotated_gpu_pipeline( batch_size=max_batch_size, num_threads=1, device_id=0, seed=1234 ) pipe.build() pipe_out = pipe.run() print(pipe_out) '

'# 测试读取视频 from nvidia.dali import pipeline_def import nvidia.dali.fn as fn import nvidia.dali.types as types import torch

assert torch.cuda.is_available() import nvidia.dali as dali print(f"DALI版本: {dali.version}") # Expected output: 1.48.0

@pipeline_def(batch_size=1, num_threads=2, device_id=0) def video_pipeline(): return fn.readers.video( device="gpu", filenames=["trailer.mp4"], sequence_length=8, skip_vfr_check=True, dtype=types.UINT8, # image_type=types.RGB, # enable_frame_num=False )

pipe = video_pipeline() pipe.build()

for i in range(3): sequences = pipe.run() print(f"Batch {i} 抽帧结果(GPU张量): {sequences[0].as_tensor().shape}")`

Check for duplicates

  • [x] I have searched the open bugs/issues and have found no duplicates for this bug report

kikiyana avatar Jun 08 '25 03:06 kikiyana

Hi @kikiyana,

Thank you for reaching out.

Could you check if your image pipeline works when you set the device in the decoder operator to mixed? Additionally, are you running the code inside a container or on bare metal?

Based on various discussion threads, this issue might be related to a misconfiguration of your environment. Is there any GPU-related code outside of DALI that you can run successfully in your setup?

JanuszL avatar Jun 09 '25 08:06 JanuszL

@JanuszL Thank you for your response! When I tried the image pipeline, it worked normally and had output. However, the video pipeline keeps failing. The output results and test code are as follows: Output:

Batch 1 shape: (32, 256, 256, 3) Batch 2 shape: (32, 256, 256, 3) Batch 1 shape: (32, 256, 256, 3) Batch 2 shape: (32, 256, 256, 3) Batch 1 shape: (32, 256, 256, 3) Batch 2 shape: (32, 256, 256, 3)

Code: `import nvidia.dali as dali import nvidia.dali.fn as fn from nvidia.dali import pipeline_def, types

@pipeline_def def image_pipeline(): files, labels = fn.readers.file(file_root="images", random_shuffle=True) images = fn.decoders.image(files, device="mixed") images = fn.resize(images, resize_x=256, resize_y=256) return images, labels

class CorrectedPipeline(dali.Pipeline): def init(self, batch_size, device_id): super().init(batch_size, 4, device_id) self._input = fn.readers.file(file_root="images")

def define_graph(self):
    files, labels = self._input
    images = fn.decoders.image(files, device="mixed")
    images = fn.resize(images, resize_x=256, resize_y=256)  # 关键修复
    return images, labels

def run_pipeline():

pipe1 = image_pipeline(batch_size=32, num_threads=4, device_id=0)
pipe1.build()

pipe2 = CorrectedPipeline(batch_size=32, device_id=0)
pipe2.build()

for _ in range(3):
    images1, labels1 = pipe1.run()
    images2, labels2 = pipe2.run()
    print(f"Batch 1 shape: {images1.as_cpu().as_array().shape}")  # (32, 256, 256, 3)
    print(f"Batch 2 shape: {images2.as_cpu().as_array().shape}")  # (32, 256, 256, 3)

if name == "main": run_pipeline()'

Additionally, I'm running the code in a Docker container.

Besides using DALI, I also use CUDA for model train and inference tasks. Here's a GPU-accelerated example I ran:

Output: 'Is PyTorch GPU available: True Current GPU: NVIDIA A800-SXM4-80GB Number of GPUs: 2 tensor([[[2.1975, 2.1975, 2.1975, ..., 2.2489, 2.2489, 2.2489], [2.1975, 2.1975, 2.1975, ..., 2.2489, 2.2489, 2.2489], [2.1975, 2.1975, 2.1975, ..., 2.2489, 2.2489, 2.2489], ..., [2.0777, 2.0948, 2.0605, ..., 1.8208, 1.8379, 1.8550], [2.0605, 2.0777, 2.0777, ..., 1.8550, 1.8722, 1.8722], [2.0948, 2.0948, 2.1119, ..., 1.9064, 1.8893, 1.8893]],

    [[2.3761, 2.3761, 2.3761,  ..., 2.4286, 2.4286, 2.4286],
     [2.3761, 2.3761, 2.3761,  ..., 2.4286, 2.4286, 2.4286],
     [2.3761, 2.3761, 2.3761,  ..., 2.4286, 2.4286, 2.4286],
     ...,
     [2.1660, 2.1835, 2.1485,  ..., 1.8158, 1.8333, 1.8508],
     [2.1485, 2.1660, 2.1660,  ..., 1.8508, 1.8683, 1.8683],
     [2.1835, 2.1835, 2.2010,  ..., 1.9034, 1.8859, 1.8859]],

    [[2.5877, 2.5877, 2.5877,  ..., 2.6400, 2.6400, 2.6400],
     [2.5877, 2.5877, 2.5877,  ..., 2.6400, 2.6400, 2.6400],
     [2.5877, 2.5877, 2.5877,  ..., 2.6400, 2.6400, 2.6400],
     ...,
     [2.2740, 2.2914, 2.2566,  ..., 1.8557, 1.8731, 1.8905],
     [2.2566, 2.2740, 2.2740,  ..., 1.8905, 1.9080, 1.9080],
     [2.2914, 2.2914, 2.3088,  ..., 1.9428, 1.9254, 1.9254]]],
   device='cuda:0')'

Code: 'import torch import torchvision.transforms as T from PIL import Image

print("Is PyTorch GPU available:", torch.cuda.is_available()) print("Current GPU:", torch.cuda.get_device_name(0)) print("Number of GPUs:", torch.cuda.device_count())

device = torch.device('cuda')

transform = T.Compose([ T.Resize(256), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])

image_data = Image.open("images/class_0/t2.jpg")

tensor_data = transform(image_data).to(device) print(tensor_data)'

I have tried multiple versions such as nvidia-dali-cuda120 1.48.0 with CUDA 12.1, and nvidia-dali-cuda110 1.48.0 with CUDA 11.7. All of them encounter the same issue:

CUDA driver API error CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected

kikiyana avatar Jun 09 '25 08:06 kikiyana

@JanuszL And even after reconfiguring the virtual environment to only install Python, PyTorch, CUDA Toolkit, DALI, etc., the same error still occurs. Conda environment: `(test)# pip list

Package Version astunparse 1.6.3 dm-tree 0.1.8 ffmpeg 1.4 ffmpeg-python 0.2.0 filelock 3.16.1 fsspec 2025.3.0 future 1.0.0 gast 0.6.0 Jinja2 3.1.6 Mako 1.3.10 MarkupSafe 2.1.5 mpmath 1.3.0 networkx 3.1 numpy 1.24.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-dali-cuda120 1.48.0 nvidia-nccl-cu12 2.20.5 nvidia-nvimgcodec-cu12 0.4.1.21 nvidia-nvjitlink-cu12 12.9.86 nvidia-nvjpeg2k-cu12 0.8.1.40 nvidia-nvtiff-cu12 0.4.0.62 nvidia-nvtx-cu12 12.1.105 packaging 24.2 pillow 10.4.0 pip 25.0.1 platformdirs 4.3.6 pycuda 2025.1.1 pytools 2024.1.14 setuptools 75.1.0 six 1.16.0 sympy 1.13.3 torch 2.4.1 torchaudio 2.4.1 torchvision 0.19.1 triton 3.0.0 typing_extensions 4.13.2 wheel 0.44.0

(test) # conda list packages in environment at /root/.local/conda/envs/test: Name Version Build Channel _libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
ca-certificates 2025.2.25 h06a4308_0
ld_impl_linux-64 2.40 h12ee557_0
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libxcb 1.17.0 h9b100fa_0
ncurses 6.4 h6a678d5_0
openssl 3.0.16 h5eee18b_0
pip 24.2 py38h06a4308_0
pthread-stubs 0.3 h0ce48e5_1
python 3.8.20 he870216_0
readline 8.2 h5eee18b_0
setuptools 75.1.0 py38h06a4308_0
sqlite 3.45.3 h5eee18b_0
tk 8.6.14 h993c535_1
wheel 0.44.0 py38h06a4308_0
xorg-libx11 1.8.12 h9b100fa_1
xorg-libxau 1.0.12 h9b100fa_0
xorg-libxdmcp 1.1.5 h9b100fa_0
xorg-xorgproto 2024.1 h5eee18b_1
xz 5.6.4 h5eee18b_1
zlib 1.2.13 h5eee18b_1 `

kikiyana avatar Jun 09 '25 09:06 kikiyana

@kikiyana,

The error you see comes from one of the methods from the driver API. I can only guess it is returned by the video SDK. Can you check if the video capability is enabled in the container (more details here)?

JanuszL avatar Jun 09 '25 10:06 JanuszL

@JanuszL Hi, I checked the NVIDIA_VISIBLE_DEVICES and NVIDIA_DRIVER_CAPABILITIES in my environment:

(test) root@dtest:~/work/dali# echo $NVIDIA_VISIBLE_DEVICES
GPU-uuid0,GPU-uuid1
(test) root@dtest:~/work/dali# echo $NVIDIA_DRIVER_CAPABILITIES

(test) root@dtest:~/work/dali#

I noticed that the output of echo $NVIDIA_DRIVER_CAPABILITIES was empty. So I tried setting it with:

export NVIDIA_DRIVER_CAPABILITIES=compute,utility,video

to enable video capability support.

However, I'm not sure if this is the correct way to enable video capabilities as you said before. After making this change, I'm still encountering an error:

CUDA driver API error CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected

I would like to confirm whether this approach is correct, or if there are any other underlying issues.

kikiyana avatar Jun 10 '25 02:06 kikiyana

Hi @kikiyana,

What I usually do is invoke Docker with --gpus 'all,"capabilities=compute,utility,video"'. Although -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video should work as well, I'm not sure if executing an export command inside the running container will have any effect.

JanuszL avatar Jun 10 '25 07:06 JanuszL

@JanuszL Well, this doesn't seem to work. I'll try other approaches to see if they succeed. Thanks a lot!

kikiyana avatar Jun 11 '25 07:06 kikiyana

I'm afraid it doesn't fall into any known issues, and more data points are needed to narrow down the problem. I recommend searching online for similar issues and trying different approaches to see what works and what doesn't. It could be very specific to your particular setup.

JanuszL avatar Jun 11 '25 07:06 JanuszL