DALI
DALI copied to clipboard
Error with GPU-only Image Decoding in NVIDIA DALI Pipeline
Describe the question.
I’m encountering an issue while running a DALI pipeline with GPU-only decoding. The pipeline works when the fn.decoders.image operator is set to "mixed" mode, but it fails with device="gpu" mode, throwing an error about incompatible device storage for the input. Here’s the setup and error details:
Code:
class SimplePipeline(Pipeline):
def __init__(self, batch_size, num_threads, device_id, external_data):
super(SimplePipeline, self).__init__(batch_size, num_threads, device_id, seed=12)
self.input = fn.external_source(source=external_data, num_outputs=2, dtype=[types.UINT8, types.INT32])
def define_graph(self):
self.jpegs, self.labels = self.input
# This works:
# self.decode = fn.decoders.image(self.jpegs, device="mixed", output_type=types.RGB)
# This fails with incompatible device storage error:
self.decode = fn.decoders.image(self.jpegs, device="gpu", output_type=types.RGB)
self.resize = fn.resize(self.decode, device="gpu", resize_x=1120, resize_y=640)
self.cmnp = fn.crop_mirror_normalize(
self.resize, device="gpu", dtype=types.FLOAT, output_layout="CHW",
crop=(640, 1120), mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0]
)
return self.cmnp, self.labels
pipe = SimplePipeline(batch_size=batch_size, num_threads=32, device_id=0, external_data=iter)
pipe.build()
Error:
RuntimeError: Assert on "IsCompatibleDevice(dev, inp_dev, op_type)" failed:
The input 0 for gpu operator nvidia.dali.fn.decoders.image is stored on incompatible device "cpu". Valid device is "gpu".
GPU and Platform Information:
GPU: NVIDIA RTX 6000 Ada Generation
CUDA Version: 12.2
DALI Version: [specify DALI version if known]
Driver Version: 535.104.05
System: Running in a Docker container with NVIDIA GPU support enabled
CUFile GDS Check: Here are the results from running gdscheck:
plaintext
(base) ➜ tools ./gdscheck -p
warn: error opening log file: Permission denied, logging will be disabled
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe : Unsupported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Enabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 1024
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 18014398509481980
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 0
execution.max_io_queue_depth : 128
execution.parallel_io : false
execution.min_io_threshold_size_kb : 1024
execution.max_request_parallelism : 0
properties.force_odirect_mode : false
properties.prefer_iouring : false
=========
GPU INFO:
=========
GPU index 0 NVIDIA RTX 6000 Ada Generation bar:1 bar size (MiB):65536, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
IOMMU: disabled
Platform verification succeeded
(base) ➜ tools
Additional Notes: The pipeline works when device="mixed" is used for fn.decoders.image, but switching to device="gpu" causes the error. I’m using external data for fn.external_source, which may be causing the device compatibility issue. The goal is to decode directly on the GPU to optimize performance.
Check for duplicates
- [x] I have searched the open bugs/issues and have found no duplicates for this bug report