CV-CUDA icon indicating copy to clipboard operation
CV-CUDA copied to clipboard

[BUG]cvcuda cause error in tensorrt image 23.07

Open hongsamvo opened this issue 9 months ago • 9 comments

I install cvcuda in tensorrt image 23.07 (and 23.09) and got error CUDA initialization failure with error: 3. I tried install by python wheel file and deb file ( ver 0.13.0 and 0.14.0) but can not solve the problem. Have someone got the same problem? Thank you!

hongsamvo avatar Apr 02 '25 07:04 hongsamvo

@hongsamvo We have tested the samples with TensorRT image 24.01-py3 as mentioned in the samples readme.

Generally CUDA initialization errors have common causes like:

  • Driver Issues: The installed NVIDIA driver may be outdated, incompatible, or improperly installed.
  • CUDA Toolkit Mismatch: There may be a version mismatch between your CUDA toolkit and the GPU driver.
  • Hardware or Resource Problems: The GPU might be in an error state, or system resources might be insufficient (e.g., running out of memory).
  • Environment Issues: If you're running in a container, or remote desktop session, the GPU might not be accessible as expected. docker containers need to use --gpus flag to make sure GPUs are accessible

Is nvidia-smi working fine? Does the issue only happen when you install and import cvcuda?

dsuthar-nvidia avatar Apr 03 '25 01:04 dsuthar-nvidia

Thank you for the answer! My machine is gtx3060, Driver Version: 535.183.01, CUDA Version: 12.2. tensorrt:23.07-py3 have cudatoolkit 12.1. So I think everything match with cvcuda_cu12 requirements. I also note that error only happen when I try to install cvcuda during docker build. If I go inside the docker container and install cvcuda, the error is not happen and nvidia-smi on my machine work fine

hongsamvo avatar Apr 03 '25 01:04 hongsamvo

I updated driver version of machine to 550 and use TensorRT image 24.01-py3 but I got the same error. This is command in my Dockerfile `# load base image tensorrt FROM nvcr.io/nvidia/tensorrt:24.01-py3

install opencv

RUN pip install opencv-python

fix missing lib for opencv

RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 curl -y

install newest torch

RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

install other libs

RUN pip install psutil minio flask flask_cors requests zmq pickle5 pyyaml
cryptography redis shapely scipy lap sentry_sdk easydict filterpy Cython cvcuda_cu12`

hongsamvo avatar Apr 03 '25 06:04 hongsamvo

@hongsamvo I tested the following and it seems to be working fine for me:

Dockerfile:

FROM nvcr.io/nvidia/tensorrt:24.01-py3
RUN pip install opencv-python
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 curl -y
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
RUN pip install psutil minio flask flask_cors requests zmq pickle5 pyyaml cryptography redis shapely scipy lap sentry_sdk easydict filterpy Cython cvcuda_cu12

RUN pip3 install cvcuda-cu12

Installing cvcuda should not require running any GPU code and hence it should work during docker build. docker build does not run with GPUs.

  1. Can you try building the above dockerfile?
  2. What docker and nvidia container toolkit versions do you have?

dsuthar-nvidia avatar Apr 03 '25 22:04 dsuthar-nvidia

@dsuthar-nvidia Thank you for your kindly support. I forgot mention that error happened when I start container and there are some trt model plan being loaded. I suspect that cvcuda created some processes that may conflict with trt load plan's processes. My docker version is 26.1.1 and nvidia container toolkit verrsion 1.13.5

hongsamvo avatar Apr 04 '25 01:04 hongsamvo

My project structure is something like this. For each trt model I modularize as a class

import cvcuda 
from trt_infer import TRTInference
class detection():
     def __init__():
           self.model = TRTInference() 
     def infer(self, frame):
           cvcuda.resize(frame....)
           output = self.model(frame)

Error only happen when I import cvcuda. Sorry for not pointing exact problem in previous post! If I don't import cvcuda everything is smooth, I am running 3 module like this with 3 diffirent trt model. I think my issue is a little bit similar to this https://github.com/CVCUDA/CV-CUDA/issues/100 In trt_infer.py (or trt_engine.py) file, I changed the order of import lib, something like this but it is not fix the issue

import tensorrt as trt
import cvcuda

hongsamvo avatar Apr 04 '25 04:04 hongsamvo

So the error does not happen during installation, but during import? Like the following simple script fails?

import cvcuda 
from trt_infer import TRTInference

can you post the complete stack trace? also try just importing cvcuda? I am trying to understand whether the issue is in import of cvcuda or installing cvcuda.

dsuthar-nvidia avatar Apr 04 '25 15:04 dsuthar-nvidia

Yes, the error only happen when I import cvcuda, install cvcuda do not cause the error.

hongsamvo avatar Apr 05 '25 04:04 hongsamvo

Error happen when I start to run trtexec for model where I place import cvcuda Image The module where got error is something like this Image

hongsamvo avatar Apr 08 '25 02:04 hongsamvo