Issue with DALI Video Reader - CUDA_ERROR_NO_DEVICE
Describe the question.
CUDA driver API error CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected Current pipeline object is no longer valid.
I encounter an issue while trying to use DALI to read video frames, and I would greatly appreciate your help in solving it. The specific error I received is:
CUDA driver API error: CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected.
However, my GPU (NVIDIA A800-SXM4-80GB) works fine when running the image loading example provided by DALI, which suggests that the CUDA environment itself is functioning properly. I'm unsure why the device cannot be detected only when processing video data.
Here is my current setup:
- OS: Linux x86_64 GNU/Linux
- CUDA version: 11.7
- DALI version: nvidia-dali-cuda110 == 1.48.0
- GPU: NVIDIA A800-SXM4-80GB
Please let me know if you need more logs or code snippets to reproduce the issue. Thank you very much for your time and support! Best regards!
nvidia-smi
Sun Jun 8 02:48:13 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========+====+====|
| 0 NVIDIA A800-SXM4-80GB On | 00000000:D0:00.0 Off | 0 |
| N/A 36C P0 65W / 500W | 2MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A800-SXM4-80GB On | 00000000:D4:00.0 Off | 0 |
| N/A 36C P0 63W / 500W | 2MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |===================| | No running processes found | +---------------------------------------------------------------------------------------+
install
for CUDA 11.0: 'pip install --extra-index-url https://pypi.nvidia.com --upgrade nvidia-dali-cuda110'
Error
DALI版本: 1.48.0 [/opt/dali/dali/operators/video/legacy/reader/video_loader.h:179] file_list_include_preceding_frame uses the default value False. In future releases, the default value will be changed to True. /opt/dali/dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc:197: Unable to decode file trailer.mp4 Traceback (most recent call last): File "test.py", line 113, in nvidia.dali.fn.readers.video, which was used in the pipeline definition with the following traceback: File "test.py", line 98, in video_pipeline return fn.readers.video( encountered: CUDA driver API error CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected Current pipeline object is no longer valid.
Code:
`# 测试读取图片:成功。 from nvidia.dali.pipeline import Pipeline from nvidia.dali import pipeline_def import nvidia.dali.fn as fn import nvidia.dali.types as types
image_dir = "./images/" max_batch_size = 8
@pipeline_def def random_rotated_gpu_pipeline(): jpegs, labels = fn.readers.file( file_root=image_dir, random_shuffle=True, initial_fill=21,file_filters=[".jpg", ".png"] ) images = fn.decoders.image(jpegs, device="cpu") angle = fn.random.uniform(range=(-10.0, 10.0)) rotated_images = fn.rotate(images.gpu(), angle=angle, fill_value=0) return rotated_images, labels
pipe = random_rotated_gpu_pipeline( batch_size=max_batch_size, num_threads=1, device_id=0, seed=1234 ) pipe.build() pipe_out = pipe.run() print(pipe_out) '
'# 测试读取视频 from nvidia.dali import pipeline_def import nvidia.dali.fn as fn import nvidia.dali.types as types import torch
assert torch.cuda.is_available() import nvidia.dali as dali print(f"DALI版本: {dali.version}") # Expected output: 1.48.0
@pipeline_def(batch_size=1, num_threads=2, device_id=0) def video_pipeline(): return fn.readers.video( device="gpu", filenames=["trailer.mp4"], sequence_length=8, skip_vfr_check=True, dtype=types.UINT8, # image_type=types.RGB, # enable_frame_num=False )
pipe = video_pipeline() pipe.build()
for i in range(3): sequences = pipe.run() print(f"Batch {i} 抽帧结果(GPU张量): {sequences[0].as_tensor().shape}")`
Check for duplicates
- [x] I have searched the open bugs/issues and have found no duplicates for this bug report
Hi @kikiyana,
Thank you for reaching out.
Could you check if your image pipeline works when you set the device in the decoder operator to mixed? Additionally, are you running the code inside a container or on bare metal?
Based on various discussion threads, this issue might be related to a misconfiguration of your environment. Is there any GPU-related code outside of DALI that you can run successfully in your setup?
@JanuszL Thank you for your response! When I tried the image pipeline, it worked normally and had output. However, the video pipeline keeps failing. The output results and test code are as follows: Output:
Batch 1 shape: (32, 256, 256, 3) Batch 2 shape: (32, 256, 256, 3) Batch 1 shape: (32, 256, 256, 3) Batch 2 shape: (32, 256, 256, 3) Batch 1 shape: (32, 256, 256, 3) Batch 2 shape: (32, 256, 256, 3)
Code: `import nvidia.dali as dali import nvidia.dali.fn as fn from nvidia.dali import pipeline_def, types
@pipeline_def def image_pipeline(): files, labels = fn.readers.file(file_root="images", random_shuffle=True) images = fn.decoders.image(files, device="mixed") images = fn.resize(images, resize_x=256, resize_y=256) return images, labels
class CorrectedPipeline(dali.Pipeline): def init(self, batch_size, device_id): super().init(batch_size, 4, device_id) self._input = fn.readers.file(file_root="images")
def define_graph(self):
files, labels = self._input
images = fn.decoders.image(files, device="mixed")
images = fn.resize(images, resize_x=256, resize_y=256) # 关键修复
return images, labels
def run_pipeline():
pipe1 = image_pipeline(batch_size=32, num_threads=4, device_id=0)
pipe1.build()
pipe2 = CorrectedPipeline(batch_size=32, device_id=0)
pipe2.build()
for _ in range(3):
images1, labels1 = pipe1.run()
images2, labels2 = pipe2.run()
print(f"Batch 1 shape: {images1.as_cpu().as_array().shape}") # (32, 256, 256, 3)
print(f"Batch 2 shape: {images2.as_cpu().as_array().shape}") # (32, 256, 256, 3)
if name == "main": run_pipeline()'
Additionally, I'm running the code in a Docker container.
Besides using DALI, I also use CUDA for model train and inference tasks. Here's a GPU-accelerated example I ran:
Output: 'Is PyTorch GPU available: True Current GPU: NVIDIA A800-SXM4-80GB Number of GPUs: 2 tensor([[[2.1975, 2.1975, 2.1975, ..., 2.2489, 2.2489, 2.2489], [2.1975, 2.1975, 2.1975, ..., 2.2489, 2.2489, 2.2489], [2.1975, 2.1975, 2.1975, ..., 2.2489, 2.2489, 2.2489], ..., [2.0777, 2.0948, 2.0605, ..., 1.8208, 1.8379, 1.8550], [2.0605, 2.0777, 2.0777, ..., 1.8550, 1.8722, 1.8722], [2.0948, 2.0948, 2.1119, ..., 1.9064, 1.8893, 1.8893]],
[[2.3761, 2.3761, 2.3761, ..., 2.4286, 2.4286, 2.4286],
[2.3761, 2.3761, 2.3761, ..., 2.4286, 2.4286, 2.4286],
[2.3761, 2.3761, 2.3761, ..., 2.4286, 2.4286, 2.4286],
...,
[2.1660, 2.1835, 2.1485, ..., 1.8158, 1.8333, 1.8508],
[2.1485, 2.1660, 2.1660, ..., 1.8508, 1.8683, 1.8683],
[2.1835, 2.1835, 2.2010, ..., 1.9034, 1.8859, 1.8859]],
[[2.5877, 2.5877, 2.5877, ..., 2.6400, 2.6400, 2.6400],
[2.5877, 2.5877, 2.5877, ..., 2.6400, 2.6400, 2.6400],
[2.5877, 2.5877, 2.5877, ..., 2.6400, 2.6400, 2.6400],
...,
[2.2740, 2.2914, 2.2566, ..., 1.8557, 1.8731, 1.8905],
[2.2566, 2.2740, 2.2740, ..., 1.8905, 1.9080, 1.9080],
[2.2914, 2.2914, 2.3088, ..., 1.9428, 1.9254, 1.9254]]],
device='cuda:0')'
Code: 'import torch import torchvision.transforms as T from PIL import Image
print("Is PyTorch GPU available:", torch.cuda.is_available()) print("Current GPU:", torch.cuda.get_device_name(0)) print("Number of GPUs:", torch.cuda.device_count())
device = torch.device('cuda')
transform = T.Compose([ T.Resize(256), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
image_data = Image.open("images/class_0/t2.jpg")
tensor_data = transform(image_data).to(device) print(tensor_data)'
I have tried multiple versions such as nvidia-dali-cuda120 1.48.0 with CUDA 12.1, and nvidia-dali-cuda110 1.48.0 with CUDA 11.7. All of them encounter the same issue:
CUDA driver API error CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected
@JanuszL And even after reconfiguring the virtual environment to only install Python, PyTorch, CUDA Toolkit, DALI, etc., the same error still occurs. Conda environment: `(test)# pip list
Package Version astunparse 1.6.3 dm-tree 0.1.8 ffmpeg 1.4 ffmpeg-python 0.2.0 filelock 3.16.1 fsspec 2025.3.0 future 1.0.0 gast 0.6.0 Jinja2 3.1.6 Mako 1.3.10 MarkupSafe 2.1.5 mpmath 1.3.0 networkx 3.1 numpy 1.24.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-dali-cuda120 1.48.0 nvidia-nccl-cu12 2.20.5 nvidia-nvimgcodec-cu12 0.4.1.21 nvidia-nvjitlink-cu12 12.9.86 nvidia-nvjpeg2k-cu12 0.8.1.40 nvidia-nvtiff-cu12 0.4.0.62 nvidia-nvtx-cu12 12.1.105 packaging 24.2 pillow 10.4.0 pip 25.0.1 platformdirs 4.3.6 pycuda 2025.1.1 pytools 2024.1.14 setuptools 75.1.0 six 1.16.0 sympy 1.13.3 torch 2.4.1 torchaudio 2.4.1 torchvision 0.19.1 triton 3.0.0 typing_extensions 4.13.2 wheel 0.44.0
(test) # conda list
packages in environment at /root/.local/conda/envs/test:
Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
ca-certificates 2025.2.25 h06a4308_0
ld_impl_linux-64 2.40 h12ee557_0
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libxcb 1.17.0 h9b100fa_0
ncurses 6.4 h6a678d5_0
openssl 3.0.16 h5eee18b_0
pip 24.2 py38h06a4308_0
pthread-stubs 0.3 h0ce48e5_1
python 3.8.20 he870216_0
readline 8.2 h5eee18b_0
setuptools 75.1.0 py38h06a4308_0
sqlite 3.45.3 h5eee18b_0
tk 8.6.14 h993c535_1
wheel 0.44.0 py38h06a4308_0
xorg-libx11 1.8.12 h9b100fa_1
xorg-libxau 1.0.12 h9b100fa_0
xorg-libxdmcp 1.1.5 h9b100fa_0
xorg-xorgproto 2024.1 h5eee18b_1
xz 5.6.4 h5eee18b_1
zlib 1.2.13 h5eee18b_1 `
@kikiyana,
The error you see comes from one of the methods from the driver API. I can only guess it is returned by the video SDK. Can you check if the video capability is enabled in the container (more details here)?
@JanuszL Hi,
I checked the NVIDIA_VISIBLE_DEVICES and NVIDIA_DRIVER_CAPABILITIES in my environment:
(test) root@dtest:~/work/dali# echo $NVIDIA_VISIBLE_DEVICES
GPU-uuid0,GPU-uuid1
(test) root@dtest:~/work/dali# echo $NVIDIA_DRIVER_CAPABILITIES
(test) root@dtest:~/work/dali#
I noticed that the output of echo $NVIDIA_DRIVER_CAPABILITIES was empty. So I tried setting it with:
export NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
to enable video capability support.
However, I'm not sure if this is the correct way to enable video capabilities as you said before. After making this change, I'm still encountering an error:
CUDA driver API error CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected
I would like to confirm whether this approach is correct, or if there are any other underlying issues.
Hi @kikiyana,
What I usually do is invoke Docker with --gpus 'all,"capabilities=compute,utility,video"'. Although -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video should work as well, I'm not sure if executing an export command inside the running container will have any effect.
@JanuszL Well, this doesn't seem to work. I'll try other approaches to see if they succeed. Thanks a lot!
I'm afraid it doesn't fall into any known issues, and more data points are needed to narrow down the problem. I recommend searching online for similar issues and trying different approaches to see what works and what doesn't. It could be very specific to your particular setup.