DALI Can't run Dali with GPU in subprocess

Hi, Thanks a lot for the great work, which helps a lot in my project. I can run Dali smoothly in CPU/GPU mode in main process. But when I try to call Dali from subprocess, I encountered the following error: "Assert on "cuInitChecked()" failed: Failed to load libcuda.so. Check your library paths and if the driver is installed correctly.". I hav checked in subprocess, I can see "cuda" device indeed. Could you please help to take a look at is there something wrong with my code? Thanks a lot!

Below is the minimum code that can reproduce the error.

class DaliChecker:
    def __init__(self, device_id, batchsize=1, prefetch=1):
        self.batch_size = batchsize
        self.prefetch = prefetch
        self.device_id = device_id
        if self.device_id is None:
            self.device = "cpu"
        else:
            self.device = "mixed"
        self.make_pipe()
        print("Pid %d build pipeline on device %s"%(os.getpid(), self.device))
        self.pipe.build()

    def make_pipe(self):
        self.pipe = Pipeline(batch_size=self.batch_size, num_threads=2, prefetch_queue_depth=self.prefetch, device_id=self.device_id)
        with self.pipe:
            self.files = fn.external_source()
            images = fn.decoders.image(self.files, device=self.device)
            self.pipe.set_outputs(images)

    def feed(self, images):
        self.pipe.feed_input(self.files, images)

def check_dali(img_folder,  dali_batchsize=1):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print("sub proc %d, %s"%(os.getpid(), device))
    
    def dali_check(img_buffer, dali_decoder):
        try:        
            dali_decoder.feed(img_buffer)
            dali_decoder.pipe.run()
            return True
        except:
            del dali_decoder
            gc.collect()
            dali_decoder = DaliChecker(dali_batchsize)
            return False
    
    #Build Dali Checker with GPU
    dali_decoder = DaliChecker(0)

    img_buffer = []
    for item in os.listdir(img_folder):
        img_file = os.path.join(img_folder, item)

        with open(img_file, 'rb') as f:
            img_bytes = f.read()
        img_data = np.frombuffer(img_bytes, dtype=np.uint8)
        img_buffer.append(img_data)

        if len(img_buffer) == dali_batchsize:
            status = dali_check(img_buffer, dali_decoder)
            img_buffer = []

if __name__ == '__main__':
    from multiprocessing import Process
    #Directly call => works!
    #check_dali(img_folder)

    #Call in sub-process =>failed
    p = Process(target=check_dali, args=(img_folder,))
    p.start()
    p.join()

More detailed error message:

Process Process-1:
Traceback (most recent call last):
  File "/home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "TestDali.py", line 87, in check_dali
    dali_decoder = DaliChecker(0)
  File "TestDali.py", line 52, in __init__
    self.pipe.build()
  File "/home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 708, in build
    self._init_pipeline_backend()
  File "/home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 618, in _init_pipeline_backend
    self._default_cuda_stream_priority)
RuntimeError: [/opt/dali/dali/core/device_guard.cc:33] Assert on "cuInitChecked()" failed: Failed to load libcuda.so. Check your library paths and if the driver is installed correctly.
Stacktrace (61 entries):
[frame 0]: /home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/site-packages/nvidia/dali/libdali_core.so(+0x3d41f) [0x7f8df342e41f]
[frame 1]: /home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/site-packages/nvidia/dali/libdali_core.so(dali::DeviceGuard::DeviceGuard(int)+0x188) [0x7f8df3433858]
[frame 2]: /home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::Pipeline::Init(int, int, int, long, bool, bool, bool, unsigned long, bool, int, int, dali::QueueSizes)+0x4e) [0x7f8ddceeab8e]
[frame 3]: /home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x4f478) [0x7f8df3764478]
[frame 4]: /home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x8f54d) [0x7f8df37a454d]
[frame 5]: python(_PyMethodDef_RawFastCallDict+0x24c) [0x5594da2d871c]
[frame 6]: python(_PyObject_FastCallDict+0x6e) [0x5594da2a8f5e]
[frame 7]: python(+0x12f0c3) [0x5594da2be0c3]
[frame 8]: python(PyObject_Call+0x66) [0x5594da2a97b6]
[frame 9]: python(+0xc239e) [0x5594da25139e]
[frame 10]: python(+0x13a8e7) [0x5594da2c98e7]
[frame 11]: /home/work/anaconda3/envs/MMARecRuntime/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x8ec62) [0x7f8df37a3c62]
[frame 12]: python(_PyObject_FastCallKeywords+0x15c) [0x5594da30e88c]
[frame 13]: python(+0x1802d1) [0x5594da30f2d1]
[frame 14]: python(_PyEval_EvalFrameDefault+0x48a2) [0x5594da356602]
[frame 15]: python(_PyFunction_FastCallKeywords+0x187) [0x5594da2c7d17]
[frame 16]: python(+0x1800c5) [0x5594da30f0c5]
[frame 17]: python(_PyEval_EvalFrameDefault+0x621) [0x5594da352381]
[frame 18]: python(_PyEval_EvalCodeWithName+0x273) [0x5594da2a7bb3]
[frame 19]: python(_PyFunction_FastCallKeywords+0x631) [0x5594da2c81c1]
[frame 20]: python(+0x1800c5) [0x5594da30f0c5]
[frame 21]: python(_PyEval_EvalFrameDefault+0x621) [0x5594da352381]
[frame 22]: python(_PyEval_EvalCodeWithName+0x273) [0x5594da2a7bb3]
[frame 23]: python(_PyObject_FastCallDict+0x312) [0x5594da2a9202]
[frame 24]: python(+0x186bef) [0x5594da315bef]
[frame 25]: python(_PyObject_FastCallKeywords+0x54c) [0x5594da30ec7c]
[frame 26]: python(_PyEval_EvalFrameDefault+0x47e5) [0x5594da356545]
[frame 27]: python(_PyEval_EvalCodeWithName+0xc5c) [0x5594da2a859c]
[frame 28]: python(_PyFunction_FastCallDict+0x1e6) [0x5594da2c7206]
[frame 29]: python(_PyEval_EvalFrameDefault+0x1d0d) [0x5594da353a6d]
[frame 30]: python(_PyFunction_FastCallKeywords+0x187) [0x5594da2c7d17]
[frame 31]: python(+0x1800c5) [0x5594da30f0c5]
[frame 32]: python(_PyEval_EvalFrameDefault+0x621) [0x5594da352381]
[frame 33]: python(_PyFunction_FastCallKeywords+0x187) [0x5594da2c7d17]
[frame 34]: python(+0x1800c5) [0x5594da30f0c5]
[frame 35]: python(_PyEval_EvalFrameDefault+0x621) [0x5594da352381]
[frame 36]: python(_PyFunction_FastCallKeywords+0x187) [0x5594da2c7d17]
[frame 37]: python(+0x1800c5) [0x5594da30f0c5]
[frame 38]: python(_PyEval_EvalFrameDefault+0x621) [0x5594da352381]
[frame 39]: python(_PyObject_FastCallDict+0x1b6) [0x5594da2a90a6]
[frame 40]: python(+0x186bef) [0x5594da315bef]
[frame 41]: python(_PyObject_FastCallKeywords+0x54c) [0x5594da30ec7c]
[frame 42]: python(_PyEval_EvalFrameDefault+0x47e5) [0x5594da356545]
[frame 43]: python(_PyFunction_FastCallKeywords+0x187) [0x5594da2c7d17]
[frame 44]: python(+0x1800c5) [0x5594da30f0c5]
[frame 45]: python(_PyEval_EvalFrameDefault+0x48a2) [0x5594da356602]
[frame 46]: python(_PyFunction_FastCallKeywords+0x187) [0x5594da2c7d17]
[frame 47]: python(+0x1800c5) [0x5594da30f0c5]
[frame 48]: python(_PyEval_EvalFrameDefault+0x48a2) [0x5594da356602]
[frame 49]: python(_PyFunction_FastCallKeywords+0x187) [0x5594da2c7d17]
[frame 50]: python(+0x1800c5) [0x5594da30f0c5]
[frame 51]: python(_PyEval_EvalFrameDefault+0x621) [0x5594da352381]
[frame 52]: python(_PyEval_EvalCodeWithName+0x273) [0x5594da2a7bb3]
[frame 53]: python(PyEval_EvalCode+0x23) [0x5594da2a8ee3]
[frame 54]: python(+0x227802) [0x5594da3b6802]
[frame 55]: python(PyRun_FileExFlags+0x9e) [0x5594da3c094e]
[frame 56]: python(PyRun_SimpleFileExFlags+0x1bb) [0x5594da3c0b3b]
[frame 57]: python(+0x232c3a) [0x5594da3c1c3a]
[frame 58]: python(_Py_UnixMain+0x3c) [0x5594da3c1ccc]
[frame 59]: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f8e18d4b3d5]
[frame 60]: python(+0x1d7555) [0x5594da366555]

Apr 15 '22 05:04 ChangXuSunny

Hi @ChangXuSunny,

Based on the error message you get I think you initialize cuda context in the main process before you fork. Unfortunately once the process initializes cuda it cannot be forked. In my case this code works fine with the latest DALI version.

import torch
import os
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.fn as fn
import numpy as np

class DaliChecker:
    def __init__(self, device_id, batchsize=1, prefetch=1):
        self.batch_size = batchsize
        self.prefetch = prefetch
        self.device_id = device_id
        if self.device_id is None:
            self.device = "cpu"
        else:
            self.device = "mixed"
        self.make_pipe()
        print("Pid %d build pipeline on device %s"%(os.getpid(), self.device))
        self.pipe.build()

    def make_pipe(self):
        self.pipe = Pipeline(batch_size=self.batch_size, num_threads=2, prefetch_queue_depth=self.prefetch, device_id=self.device_id)
        with self.pipe:
            self.files = fn.external_source()
            images = fn.decoders.image(self.files, device=self.device)
            self.pipe.set_outputs(images)

    def feed(self, images):
        self.pipe.feed_input(self.files, images)

def check_dali(img_folder,  dali_batchsize=1):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print("sub proc %d, %s"%(os.getpid(), device))

    def dali_check(img_buffer, dali_decoder):
        try:
            dali_decoder.feed(img_buffer)
            dali_decoder.pipe.run()
            return True
        except:
            del dali_decoder
            gc.collect()
            dali_decoder = DaliChecker(dali_batchsize)
            return False

    #Build Dali Checker with GPU
    dali_decoder = DaliChecker(0)

    img_buffer = []
    for item in os.listdir(img_folder):
        img_file = os.path.join(img_folder, item)

        with open(img_file, 'rb') as f:
            img_bytes = f.read()
        img_data = np.frombuffer(img_bytes, dtype=np.uint8)
        img_buffer.append(img_data)

        if len(img_buffer) == dali_batchsize:
            status = dali_check(img_buffer, dali_decoder)
            img_buffer = []

if __name__ == '__main__':
    from multiprocessing import Process
    #Directly call => works!
    #check_dali(img_folder)

    #Call in sub-process =>failed
    img_folder = os.path.join(os.environ['DALI_EXTRA_PATH'], "db/single/jpeg/100/")
    p = Process(target=check_dali, args=(img_folder,))
    p.start()

Apr 15 '22 09:04 JanuszL

Hi @JanuszL

Thanks for your reply. I have updated my Dali version to 1.12.0 with command "pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110". But still get same error message as before.

My cuda version is 11.0 and my cuda deriver is 450.51.05.

May I know which Dali version that works for you? I didn't explicitly initialize cuda environment in main process. Or do you know how to initialize cuda after forking in sub-process?

Thanks a lot!

Apr 18 '22 02:04 ChangXuSunny

Hi @JanuszL,

I can share cuda env with sub-process by setting "mp.set_start_method('forkserver', force=True)". But I encountered another error messange: "TypeError: can't pickle nvidia.dali.backend_impl.OpSchema objects". Do you know how to solve it?

Thanks a lot!

Apr 18 '22 02:04 ChangXuSunny

Hi @ChangXuSunny,

I'm using 1.12 as well:

pip show nvidia-dali-cuda110
Name: nvidia-dali-cuda110
Version: 1.12.0
Summary: NVIDIA DALI  for CUDA 11.0. Git SHA: fcdbcd1f861ce862173f86005203026c072862c1
Home-page: https://github.com/NVIDIA/dali
Author: NVIDIA Corporation
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.8/dist-packages
Requires:
Required-by:

Have you run the adjusted snippet I provided in https://github.com/NVIDIA/DALI/issues/3826#issuecomment-1100003573? Can you provide the code that run on your end (including all the necessary imports)?

Apr 19 '22 08:04 JanuszL

Hi @JanuszL

I am experiencing similar problem, but the major difference is that I can't get it running even in main process. I use cuda 11.7 and nvidia-dail-cu110=1.16.1

>>> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

libcuda.so can be found in /etc/lib and is listed by ldconfig -v

Do you have any idea why is this happening?

Sep 07 '22 12:09 eugen-vusak

Hi @eugen-vusak,

Can you capture the nvidia-smi results and provide the exact steps you follow to run DALI together with the error message?

Sep 07 '22 12:09 JanuszL

Thank you for the fast reply!

I'm running same exact script from first comment on this issue, just without sub-process. I'm using commented out code for direct call. I trying to run it locally without NVIDIA GPU only using a CPU. Which as I gathered from documentation should be plausible. Because of this I don't have NVIDIA driver installed.

>>>nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

result i get from running the script is following

sub proc 2480, cpu
Pid 2480 build pipeline on device mixed
Traceback (most recent call last):
  File "/home/evusak/Documents/Photomath/repos/problem-database-analysis/test_dali.py", line 74, in <module>
    check_dali(img_folder)
  File "/home/evusak/Documents/Photomath/repos/problem-database-analysis/test_dali.py", line 53, in check_dali
    dali_decoder = DaliChecker(0)
  File "/home/evusak/Documents/Photomath/repos/problem-database-analysis/test_dali.py", line 19, in __init__
    self.pipe.build()
  File "/home/evusak/.virtualenvs/pm39/lib/python3.9/site-packages/nvidia/dali/pipeline.py", line 825, in build
    self._init_pipeline_backend()
  File "/home/evusak/.virtualenvs/pm39/lib/python3.9/site-packages/nvidia/dali/pipeline.py", line 694, in _init_pipeline_backend
    self._pipe = b.Pipeline(self._max_batch_size,
RuntimeError: [/opt/dali/dali/core/device_guard.cc:33] Assert on "cuInitChecked()" failed: Failed to load libcuda.so. Check your library paths and if the driver is installed correctly.
Stacktrace (41 entries):
[frame 0]: /home/evusak/.virtualenvs/pm39/lib/python3.9/site-packages/nvidia/dali/libdali_core.so(+0x2db3f) [0x7ff1dca2db3f]
[frame 1]: /home/evusak/.virtualenvs/pm39/lib/python3.9/site-packages/nvidia/dali/libdali_core.so(dali::DeviceGuard::DeviceGuard(int)+0x188) [0x7ff1dca337f8]
[frame 2]: /home/evusak/.virtualenvs/pm39/lib/python3.9/site-packages/nvidia/dali/libdali.so(dali::Pipeline::Init(int, int, int, long, bool, bool, bool, unsigned long, bool, int, int, dali::QueueSizes)+0x4e) [0x7ff1ecf6b86e]
[frame 3]: /home/evusak/.virtualenvs/pm39/lib/python3.9/site-packages/nvidia/dali/backend_impl.cpython-39-x86_64-linux-gnu.so(+0x5af2d) [0x7ff1d7e5af2d]
[frame 4]: /home/evusak/.virtualenvs/pm39/lib/python3.9/site-packages/nvidia/dali/backend_impl.cpython-39-x86_64-linux-gnu.so(+0xa7dc5) [0x7ff1d7ea7dc5]
[frame 5]: /usr/lib/libpython3.9.so.1.0(+0x127133) [0x7ff24c327133]
[frame 6]: /usr/lib/libpython3.9.so.1.0(_PyObject_MakeTpCall+0xaa) [0x7ff24c2adc5a]
[frame 7]: /usr/lib/libpython3.9.so.1.0(+0x12839a) [0x7ff24c32839a]
[frame 8]: /usr/lib/libpython3.9.so.1.0(+0x166b80) [0x7ff24c366b80]
[frame 9]: /usr/lib/libpython3.9.so.1.0(+0xc20c7) [0x7ff24c2c20c7]
[frame 10]: /home/evusak/.virtualenvs/pm39/lib/python3.9/site-packages/torch/lib/libtorch_python.so(+0x21d7a9) [0x7ff24341d7a9]
[frame 11]: /usr/lib/libpython3.9.so.1.0(_PyObject_MakeTpCall+0xaa) [0x7ff24c2adc5a]
[frame 12]: /usr/lib/libpython3.9.so.1.0(+0x5adfe) [0x7ff24c25adfe]
[frame 13]: /usr/lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x6185) [0x7ff24c261445]
[frame 14]: /usr/lib/libpython3.9.so.1.0(+0x5b203) [0x7ff24c25b203]
[frame 15]: /usr/lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x5ff0) [0x7ff24c2612b0]
[frame 16]: /usr/lib/libpython3.9.so.1.0(+0x180eea) [0x7ff24c380eea]
[frame 17]: /usr/lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb1) [0x7ff24c3814a1]
[frame 18]: /usr/lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x5ff0) [0x7ff24c2612b0]
[frame 19]: /usr/lib/libpython3.9.so.1.0(+0x180eea) [0x7ff24c380eea]
[frame 20]: /usr/lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb1) [0x7ff24c3814a1]
[frame 21]: /usr/lib/libpython3.9.so.1.0(_PyObject_FastCallDictTstate+0x6b) [0x7ff24c3271eb]
[frame 22]: /usr/lib/libpython3.9.so.1.0(_PyObject_Call_Prepend+0x112) [0x7ff24c3273e2]
[frame 23]: /usr/lib/libpython3.9.so.1.0(+0x166be4) [0x7ff24c366be4]
[frame 24]: /usr/lib/libpython3.9.so.1.0(+0xc20c7) [0x7ff24c2c20c7]
[frame 25]: /usr/lib/libpython3.9.so.1.0(_PyObject_MakeTpCall+0xaa) [0x7ff24c2adc5a]
[frame 26]: /usr/lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x59a6) [0x7ff24c260c66]
[frame 27]: /usr/lib/libpython3.9.so.1.0(+0x180eea) [0x7ff24c380eea]
[frame 28]: /usr/lib/libpython3.9.so.1.0(_PyFunction_Vectorcall+0xb1) [0x7ff24c3814a1]
[frame 29]: /usr/lib/libpython3.9.so.1.0(_PyEval_EvalFrameDefault+0x53be) [0x7ff24c26067e]
[frame 30]: /usr/lib/libpython3.9.so.1.0(+0x180eea) [0x7ff24c380eea]
[frame 31]: /usr/lib/libpython3.9.so.1.0(PyEval_EvalCode+0x41) [0x7ff24c381311]
[frame 32]: /usr/lib/libpython3.9.so.1.0(+0x181373) [0x7ff24c381373]
[frame 33]: /usr/lib/libpython3.9.so.1.0(+0x26a476) [0x7ff24c46a476]
[frame 34]: /usr/lib/libpython3.9.so.1.0(+0x26b034) [0x7ff24c46b034]
[frame 35]: /usr/lib/libpython3.9.so.1.0(+0x26b24a) [0x7ff24c46b24a]
[frame 36]: /usr/lib/libpython3.9.so.1.0(PyRun_AnyFileExFlags+0x85) [0x7ff24c46c3e5]
[frame 37]: /usr/lib/libpython3.9.so.1.0(Py_RunMain+0x879) [0x7ff24c46ccd9]
[frame 38]: /usr/lib/libc.so.6(+0x232d0) [0x7ff24c5f42d0]
[frame 39]: /usr/lib/libc.so.6(__libc_start_main+0x8a) [0x7ff24c5f438a]
[frame 40]: /home/evusak/.virtualenvs/pm39/bin/python(_start+0x25) [0x555b2a69e055]

Sep 08 '22 07:09 eugen-vusak

Hi @eugen-vusak,

If you don't want to use GPU and don't have the driver installed you need to let DALI know that it should be limited to only CPU-based pipelines. Please check the device_id documentation section of the pipeline:

device_id (int, optional, default = -1) – Id of GPU used by the pipeline. A None value for this parameter means that DALI should not use GPU nor CUDA runtime. This limits the pipeline to only CPU operators but allows it to run on any CPU capable machine.

Sep 08 '22 08:09 JanuszL

DALI DALI copied to clipboard

Can't run Dali with GPU in subprocess

DALI
DALI copied to clipboard