TensorRT 🐛 [Bug] error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr

Bug Description

hi i see the following error - it looks like the torch.compile worked fine but when i invoke the prediction after that it errors out:

[INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-[W] Unable to determine GPU memory usage
[INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-[TRT] [W] Unable to determine GPU memory usage
[INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-[TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1104, GPU 0 (MiB)
[INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-[TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr

does pytorch-tensorrt work with a g4dn.xlarge? why i get this: CUDA initialization failure with error: 35?

full log: tensorrt_torch_error.txt

To Reproduce

Steps to reproduce the behavior:

build container with tensorrt

# use sagemaker DLC
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker

# Install additional dependencies
RUN python -m pip install torch torch-tensorrt tensorrt --extra-index-ur https://download.pytorch.org/whl/cu118

how was the model compiled?

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model, backend="torch_tensorrt", dynamic=False,
                                options={"truncate_long_and_double": True,
                                         "precision": torch.half,
                                         "debug": True,
                                         "min_block_size": 1,
                                         "optimization_level": 4,
                                         "use_python_runtime": False})

to rule out that the issue is somewhere else - i tested with the following torch.compile - this works fine:

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model, mode="reduce-overhead")

should i try some other settings for torch.compile(model.model_body[0].auto_model, backend="torch_tensorrt" ?

could the error be related to https://github.com/NVIDIA/TensorRT/issues/308 ?

Expected behavior

no error

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0):
PyTorch Version (e.g. 1.0): 2.1
CPU Architecture: g4dn.xlarge
OS (e.g., Linux):
How you installed PyTorch (conda, pip, libtorch, source):
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version:
CUDA version:
GPU models and configuration:
Any other relevant information:

Additional context

May 10 '24 21:05 geraldstanje

Can you share something like the NVIDIA-SMI print out that can show us the driver version and status?

May 14 '24 00:05 narendasan

@narendasan sure. in the meantime where can i check compatibility of cuda driver, pytorch version, pytorch/TensorRT version etc.?

May 14 '24 19:05 geraldstanje

For PyTorch vs Torch-TensorRT compatibility, the versions are aligned, so PyTorch v2.2.0 <-> Torch-TensorRT v2.2.0 (prior to PyTorch 2.0, it would be something like PyTorch 1.13 <-> Torch-TensorRT 1.3.0). For driver compatibility this is based on CUDA https://docs.nvidia.com/deploy/cuda-compatibility/index.html. So if your PyTorch build targets CUDA 11.8 you need >= 450.80.02. If you are using a 12.1 PyTorch then you need to use >=525.60.13. NVIDIA-SMI can help you determine if your CUDA and CUDA-Driver are aligned.

May 14 '24 21:05 narendasan

@narendasan i tried it with:

nvidia-smi:

+------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.8
|------------------------------+
| GPU Name Persistance-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=======================|

nvcc -V:

nvcc_output: nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
cuda_11.8.r11.8/compiler.31833905_0

GPU: Nvidia Tesla T4
Torch v2.2.0
Torch-TensorRT v2.2.0
pip list output:

Package Version
--------------------------- --------------
aiohttp 3.9.5
aiosignal 1.3.1
aniso8601 9.0.1
ansi2html 1.9.1
archspec 0.2.2
arrow 1.3.0
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.2.0
awscli 1.32.108
blinker 1.8.2
boltons 23.1.1
boto3 1.34.108
botocore 1.34.108
Brotli 1.1.0
cached-property 1.5.2
captum 0.6.0
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
colorama 0.4.6
conda 23.11.0
conda-content-trust 0.2.0
conda-libmamba-solver 23.12.0
conda-package-handling 2.2.0
conda_package_streaming 0.9.0
contourpy 1.2.1
cryptography 42.0.7
cycler 0.12.1
Cython 3.0.10
datasets 2.19.1
decorator 5.1.1
dill 0.3.8
distro 1.8.0
docutils 0.16
enum-compat 0.0.3
evaluate 0.4.2
exceptiongroup 1.2.1
executing 2.0.1
filelock 3.14.0
Flask 3.0.3
Flask-RESTful 0.3.10
fonttools 4.51.0
frozenlist 1.4.1
fsspec 2024.3.1
h5py 3.11.0
huggingface-hub 0.23.1
idna 3.7
ipython 8.18.0
itsdangerous 2.2.0
jedi 0.19.1
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
jsonpatch 1.33
jsonpointer 2.4
kiwisolver 1.4.5
libmambapy 1.5.5
mamba 1.5.5
MarkupSafe 2.1.5
matplotlib 3.9.0
matplotlib-inline 0.1.7
menuinst 2.0.1
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.3
ninja 1.11.1.1
numpy 1.26.4
nvgpu 0.10.0
nvidia-cublas-cu11 11.11.3.6
nvidia-cublas-cu12 12.5.2.13
nvidia-cuda-cupti-cu11 11.8.87
nvidia-cuda-nvrtc-cu11 11.8.89
nvidia-cuda-runtime-cu11 11.8.89
nvidia-cuda-runtime-cu12 12.5.39
nvidia-cudnn-cu11 8.7.0.84
nvidia-cudnn-cu12 9.1.1.17
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.3.0.86
nvidia-cusolver-cu11 11.4.1.48
nvidia-cusparse-cu11 11.7.5.86
nvidia-nccl-cu11 2.19.3
nvidia-nvtx-cu11 11.8.86
opencv-python 4.9.0.80
packaging 23.2
pandas 2.2.2
parso 0.8.4
pexpect 4.9.0
pillow 10.3.0
pip 24.0
platformdirs 4.1.0
pluggy 1.3.0
prompt-toolkit 3.0.38
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 15.0.2
pyarrow-hotfix 0.6
pyasn1 0.6.0
pycosat 0.6.6
pycparser 2.21
Pygments 2.18.0
pynvml 11.5.0
pyOpenSSL 24.1.0
pyparsing 3.1.2
PySocks 1.7.1
python-dateutil 2.9.0
pytz 2024.1
PyYAML 6.0
regex 2024.5.15
requests 2.31.0
retrying 1.3.4
rsa 4.7.2
ruamel.yaml 0.18.5
ruamel.yaml.clib 0.2.7
s3transfer 0.10.1
safetensors 0.4.3
sagemaker-inference 1.10.1
sagemaker-pytorch-inference 2.0.23
scikit-learn 1.4.2
scipy 1.13.0
sentence-transformers 2.7.0
setfit 1.0.1
setuptools 68.2.2
six 1.16.0
stack-data 0.6.3
sympy 1.12
tabulate 0.9.0
tensorrt 8.6.1.post1
tensorrt-bindings 8.6.1
tensorrt-libs 8.6.1
termcolor 2.4.0
threadpoolctl 3.5.0
tokenizers 0.15.2
torch 2.2.0+cu118
torch-model-archiver 0.11.0
torch-tensorrt 2.2.0+cu118
torchaudio 2.2.0+cu118
torchdata 0.7.1+5e6f7b7
torchserve 0.11.0
torchtext 0.17.0+cu118
torchvision 0.17.0+cu118
tqdm 4.66.4
traitlets 5.14.3
transformers 4.37.2
triton 2.2.0
truststore 0.8.0
types-python-dateutil 2.9.0.20240316
typing_extensions 4.11.0
tzdata 2024.1
urllib3 1.26.18
wcwidth 0.2.13
Werkzeug 3.0.3
wheel 0.42.0
xxhash 3.4.1
yarl 1.9.4
zstandard 0.22.0

and get same error - is that expected?

May 26 '24 21:05 geraldstanje

@geraldstanje I tried the resnet example in https://pytorch.org/TensorRT/tutorials/_rendered_examples/dynamo/torch_compile_resnet_example.html with : | NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.8 | The GPU is Nvidia-A100 80G and run nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

and the pip list show that:

Package                  Version
------------------------ ------------
certifi                  2024.6.2
charset-normalizer       3.3.2
filelock                 3.15.4
fsspec                   2024.6.1
huggingface-hub          0.23.4
idna                     3.7
Jinja2                   3.1.4
joblib                   1.4.2
MarkupSafe               2.1.5
mpmath                   1.3.0
networkx                 3.2.1
numpy                    1.25.2
nvidia-cublas-cu11       11.11.3.6
nvidia-cublas-cu12       12.5.3.2
nvidia-cuda-cupti-cu11   11.8.87
nvidia-cuda-nvrtc-cu11   11.8.89
nvidia-cuda-runtime-cu11 11.8.89
nvidia-cuda-runtime-cu12 12.5.82
nvidia-cudnn-cu11        8.7.0.84
nvidia-cudnn-cu12        9.1.1.17
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.3.0.86
nvidia-cusolver-cu11     11.4.1.48
nvidia-cusparse-cu11     11.7.5.86
nvidia-nccl-cu11         2.19.3
nvidia-nvtx-cu11         11.8.86
onnx                     1.16.1
packaging                24.1
pillow                   10.3.0
pip                      24.0
protobuf                 5.27.2
PyYAML                   6.0.1
regex                    2024.5.15
requests                 2.32.3
safetensors              0.4.3
scikit-learn             1.5.0
scipy                    1.13.1
sentence-transformers    3.0.1
setuptools               69.5.1
sympy                    1.12.1
tensorrt                 8.6.1.post1
tensorrt-bindings        8.6.1
tensorrt-libs            8.6.1
threadpoolctl            3.5.0
tokenizers               0.19.1
torch                    2.2.0+cu118
torch-tensorrt           2.2.0+cu118
torchvision              0.17.0+cu118
tqdm                     4.66.4
transformers             4.42.3
triton                   2.2.0
typing_extensions        4.12.2
urllib3                  2.2.2
wheel                    0.43.0

have you or anyone else fixed this bug? Please let me know, thank you very much!

Jul 08 '24 07:07 tanzelin430

TensorRT TensorRT copied to clipboard

🐛 [Bug] error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

TensorRT
TensorRT copied to clipboard