TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Engine building failure of TensorRT 10.2.0 (pip install) when building a custom diffusion model on RTX 4090

Open ifeherva opened this issue 1 year ago • 13 comments

Description

Fresh install of pip install tensorrt==10.2.0

Following engine build crashes on Ubuntu 22.04.4 LTS:

from polygraphy.backend.trt import EngineFromNetwork

EngineFromNetwork(
            network,
            config=CreateConfig(fp16=fp16,
                tf32=tf32,
                int8=int8,
                refittable=enable_refit,
                profiles=[p],
                load_timing_cache=timing_cache,
                builder_optimization_level=3,
                **extra_build_args
            ),
            save_timing_cache=timing_cache
        )()

Error message:

IBuilder::buildSerializedNetwork: Error Code 6: API Usage Error (Unable to load library: libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

Build works fine on 10.1.0 and 10.0.0

Environment

TensorRT Version: 10.2.0

NVIDIA GPU: RTX 4090

NVIDIA Driver Version: 550

CUDA Version: 12.1.r12.1

CUDNN Version: 8.9.7

Operating System: Ubuntu 22.04.4 LTS

Python Version (if applicable):

PyTorch Version (if applicable): 2.3.1

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

This is the latest release.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Yes, the above command completes successfully, the ONNX file is correct.

ifeherva avatar Jul 04 '24 18:07 ifeherva

+1. Same issue: tensorrt fails due to non-existent windows library in a linux distro (libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

lautaropaske avatar Jul 04 '24 23:07 lautaropaske

how to fix it?

lanyuer avatar Jul 05 '24 02:07 lanyuer

Almost exactly the same setup here, same problem. Using it via ComfyUI.

TensorRT Version: 10.2.0 NVIDIA GPU: RTX 4090 CUDA Version: 12.1.105 CUDNN Version: 8.9.2.26 Operating System: Ubuntu 22.04.3 Python Version (if applicable): 3.10 PyTorch Version (if applicable): 2.3.1+cu121

I did a little bit of research on this and determined that the non-Windows library (libnvinfer_builder_resource.so.10.2.0) was already opened by the process, so it's a real mystery to me why it was trying to open the Windows version. The dlopen (or whatever) is happening inside the tensorrt.so compiled code, not anything to do with the Python wrapper around it, so it's hard to debug farther.

I made a symlink from the proper DSO to the Windows filename, but that fixed nothing: The symbols that it then looks for inside are also suffixed with _win.

I asked in the discussion forum for the Comfy nodes... https://github.com/comfyanonymous/ComfyUI_TensorRT/discussions/49 But clearly they have nothing to do with it.

thefoxfarmer avatar Jul 05 '24 03:07 thefoxfarmer

I also have this problem on Windows WSL2.

lanyuer avatar Jul 05 '24 03:07 lanyuer

10.1.0 is also working for me on the setup outlined above where 10.2.0 did not.

thefoxfarmer avatar Jul 05 '24 03:07 thefoxfarmer

Downgrading by running this command fixed the issue for me.

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall

BuffMcBigHuge avatar Jul 05 '24 17:07 BuffMcBigHuge

通过运行此命令降级为我解决了这个问题。

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall

Solved my problem, thanks.

online2311 avatar Jul 06 '24 08:07 online2311

Resolved in Ultralytics package by pinning tensorrt<=10.2.0, but does not resolve underlying issue unfortunately. https://github.com/ultralytics/ultralytics/pull/14239

glenn-jocher avatar Jul 06 '24 09:07 glenn-jocher

通过运行此命令降级为我解决了这个问题。

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall

Solved my problem, thanks.

same!!! thank uuuu

RONNYKHALIL avatar Jul 06 '24 18:07 RONNYKHALIL

libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

can you find libnvinfer_builder_resource_win.so.10.2.0 ?

The tensorrt Python wheel files only support Python versions 3.8 to 3.12 at this time and will not work with other Python versions. Only the Linux and Windows operating systems and the x86_64 CPU architecture are currently supported. These Python wheel files are expected to work on RHEL 8 or newer, Ubuntu 20.04 or newer, and Windows 10 or newer.

lix19937 avatar Jul 07 '24 10:07 lix19937

libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

can you find libnvinfer_builder_resource_win.so.10.2.0 ?

The tensorrt Python wheel files only support Python versions 3.8 to 3.12 at this time and will not work with other Python versions. Only the Linux and Windows operating systems and the x86_64 CPU architecture are currently supported. These Python wheel files are expected to work on RHEL 8 or newer, Ubuntu 20.04 or newer, and Windows 10 or newer.

No, those _win files dont exist on ubuntu.

ifeherva avatar Jul 07 '24 14:07 ifeherva

For me installing tensorrt_llm==0.12.0.dev2024070200 works!

zolero avatar Jul 09 '24 07:07 zolero

Upgrading to tensorrt==0.2.0.post1 fixes the problem.

yorickvP avatar Jul 26 '24 13:07 yorickvP

Closing this ticket; please re-open if this is still an issue on TensorRT 10.8.

brnguyen2 avatar Feb 11 '25 16:02 brnguyen2