[BUG]cvcuda cause error in tensorrt image 23.07
I install cvcuda in tensorrt image 23.07 (and 23.09) and got error CUDA initialization failure with error: 3. I tried install by python wheel file and deb file ( ver 0.13.0 and 0.14.0) but can not solve the problem. Have someone got the same problem? Thank you!
@hongsamvo We have tested the samples with TensorRT image 24.01-py3 as mentioned in the samples readme.
Generally CUDA initialization errors have common causes like:
- Driver Issues: The installed NVIDIA driver may be outdated, incompatible, or improperly installed.
- CUDA Toolkit Mismatch: There may be a version mismatch between your CUDA toolkit and the GPU driver.
- Hardware or Resource Problems: The GPU might be in an error state, or system resources might be insufficient (e.g., running out of memory).
- Environment Issues: If you're running in a container, or remote desktop session, the GPU might not be accessible as expected. docker containers need to use --gpus flag to make sure GPUs are accessible
Is nvidia-smi working fine? Does the issue only happen when you install and import cvcuda?
Thank you for the answer! My machine is gtx3060, Driver Version: 535.183.01, CUDA Version: 12.2. tensorrt:23.07-py3 have cudatoolkit 12.1. So I think everything match with cvcuda_cu12 requirements. I also note that error only happen when I try to install cvcuda during docker build. If I go inside the docker container and install cvcuda, the error is not happen and nvidia-smi on my machine work fine
I updated driver version of machine to 550 and use TensorRT image 24.01-py3 but I got the same error. This is command in my Dockerfile `# load base image tensorrt FROM nvcr.io/nvidia/tensorrt:24.01-py3
install opencv
RUN pip install opencv-python
fix missing lib for opencv
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 curl -y
install newest torch
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
install other libs
RUN pip install psutil minio flask flask_cors requests zmq pickle5 pyyaml
cryptography redis shapely scipy lap sentry_sdk easydict filterpy Cython cvcuda_cu12`
@hongsamvo I tested the following and it seems to be working fine for me:
Dockerfile:
FROM nvcr.io/nvidia/tensorrt:24.01-py3
RUN pip install opencv-python
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 curl -y
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
RUN pip install psutil minio flask flask_cors requests zmq pickle5 pyyaml cryptography redis shapely scipy lap sentry_sdk easydict filterpy Cython cvcuda_cu12
RUN pip3 install cvcuda-cu12
Installing cvcuda should not require running any GPU code and hence it should work during docker build. docker build does not run with GPUs.
- Can you try building the above dockerfile?
- What docker and nvidia container toolkit versions do you have?
@dsuthar-nvidia Thank you for your kindly support. I forgot mention that error happened when I start container and there are some trt model plan being loaded. I suspect that cvcuda created some processes that may conflict with trt load plan's processes. My docker version is 26.1.1 and nvidia container toolkit verrsion 1.13.5
My project structure is something like this. For each trt model I modularize as a class
import cvcuda
from trt_infer import TRTInference
class detection():
def __init__():
self.model = TRTInference()
def infer(self, frame):
cvcuda.resize(frame....)
output = self.model(frame)
Error only happen when I import cvcuda. Sorry for not pointing exact problem in previous post! If I don't import cvcuda everything is smooth, I am running 3 module like this with 3 diffirent trt model. I think my issue is a little bit similar to this https://github.com/CVCUDA/CV-CUDA/issues/100 In trt_infer.py (or trt_engine.py) file, I changed the order of import lib, something like this but it is not fix the issue
import tensorrt as trt
import cvcuda
So the error does not happen during installation, but during import? Like the following simple script fails?
import cvcuda
from trt_infer import TRTInference
can you post the complete stack trace? also try just importing cvcuda? I am trying to understand whether the issue is in import of cvcuda or installing cvcuda.
Yes, the error only happen when I import cvcuda, install cvcuda do not cause the error.
Error happen when I start to run trtexec for model where I place import cvcuda
The module where got error is something like this