DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Error with GPU-only Image Decoding in NVIDIA DALI Pipeline

Open aafaqin opened this issue 11 months ago • 7 comments

Describe the question.

I’m encountering an issue while running a DALI pipeline with GPU-only decoding. The pipeline works when the fn.decoders.image operator is set to "mixed" mode, but it fails with device="gpu" mode, throwing an error about incompatible device storage for the input. Here’s the setup and error details:

Code:

class SimplePipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, external_data):
        super(SimplePipeline, self).__init__(batch_size, num_threads, device_id, seed=12)
        self.input = fn.external_source(source=external_data, num_outputs=2, dtype=[types.UINT8, types.INT32])

    def define_graph(self):
        self.jpegs, self.labels = self.input
        # This works:
        # self.decode = fn.decoders.image(self.jpegs, device="mixed", output_type=types.RGB)
        
        # This fails with incompatible device storage error:
        self.decode = fn.decoders.image(self.jpegs, device="gpu", output_type=types.RGB)
        self.resize = fn.resize(self.decode, device="gpu", resize_x=1120, resize_y=640)
        
        self.cmnp = fn.crop_mirror_normalize(
            self.resize, device="gpu", dtype=types.FLOAT, output_layout="CHW",
            crop=(640, 1120), mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0]
        )
        return self.cmnp, self.labels

pipe = SimplePipeline(batch_size=batch_size, num_threads=32, device_id=0, external_data=iter)
pipe.build()

Error:

RuntimeError: Assert on "IsCompatibleDevice(dev, inp_dev, op_type)" failed: The input 0 for gpu operator nvidia.dali.fn.decoders.image is stored on incompatible device "cpu". Valid device is "gpu".

GPU and Platform Information:

GPU: NVIDIA RTX 6000 Ada Generation
CUDA Version: 12.2
DALI Version: [specify DALI version if known]
Driver Version: 535.104.05
System: Running in a Docker container with NVIDIA GPU support enabled

CUFile GDS Check: Here are the results from running gdscheck:

plaintext

(base) ➜  tools ./gdscheck -p
warn: error opening log file: Permission denied, logging will be disabled
 ============
 ENVIRONMENT:
 ============
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe               : Unsupported
 NVMeOF             : Unsupported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Unsupported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 BeeGFS             : Unsupported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Enabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Not configured
 --rdma_device_status  : Up: 0 Down: 0
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_compat_mode : true
 properties.force_compat_mode : false
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_size : 128
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 1024
 properties.max_device_cache_size_kb : 131072
 properties.max_device_pinned_mem_size_kb : 18014398509481980
 properties.posix_pool_slab_size_kb : 4 1024 16384 
 properties.posix_pool_slab_count : 128 64 32 
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 0
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.beegfs.posix_gds_min_kb: 0
 fs.weka.rdma_write_support: false
 fs.gpfs.gds_write_support: false
 profile.nvtx : false
 profile.cufile_stats : 0
 miscellaneous.api_check_aggressive : false
 execution.max_io_threads : 0
 execution.max_io_queue_depth : 128
 execution.parallel_io : false
 execution.min_io_threshold_size_kb : 1024
 execution.max_request_parallelism : 0
 properties.force_odirect_mode : false
 properties.prefer_iouring : false
 =========
 GPU INFO:
 =========
 GPU index 0 NVIDIA RTX 6000 Ada Generation bar:1 bar size (MiB):65536, IOMMU State: Disabled
 ==============
 PLATFORM INFO:
 ==============
 IOMMU: disabled
 Platform verification succeeded
(base) ➜  tools 

Additional Notes: The pipeline works when device="mixed" is used for fn.decoders.image, but switching to device="gpu" causes the error. I’m using external data for fn.external_source, which may be causing the device compatibility issue. The goal is to decode directly on the GPU to optimize performance.

Check for duplicates

  • [x] I have searched the open bugs/issues and have found no duplicates for this bug report

aafaqin avatar Nov 03 '24 20:11 aafaqin