TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

GroupNormalization plugin failure of TensorRT 10.0.1.6 when running trtexec on GPU A4000

Open appearancefnp opened this issue 1 year ago • 9 comments

Description

Hey guys! I wanted to upgrade from TensorRT 8.6 to 10.0. I have a ONNX model that contains GroupNormalization plugin. It creates a serialized version, but it fails when deserializing the model while trying to load cudnn 8 instead of cudnn 9.

Environment

Using docker: nvcr.io/nvidia/tensorrt:24.05-py3

TensorRT Version: 10.0.1

NVIDIA GPU: A4000

NVIDIA Driver Version: 550.67

CUDA Version: 12.4

CUDNN Version: 9.1 (per container documentation)

Operating System:

Python Version (if applicable): -

Tensorflow Version (if applicable): -

PyTorch Version (if applicable): -

Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.05-py3

Relevant Files

Model link: https://drive.google.com/file/d/1vmGZpWJ_1sfz2ejbZoO3fFaR5udxOLTi/view?usp=sharing

Steps To Reproduce

  1. Run trtexec: trtexec --onnx=model.onnx
  2. trtexec builds the engine
...
[06/17/2024-14:57:28] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 3 MiB, GPU 1984 MiB
[06/17/2024-14:57:28] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3059 MiB
[06/17/2024-14:57:28] [I] Engine built in 886.712 sec.
[06/17/2024-14:57:28] [I] Created engine with size: 55.3649 MiB
[06/17/2024-14:57:28] [I] [TRT] Loaded engine size: 55 MiB
[06/17/2024-14:57:28] [I] Engine deserialized in 0.0301295 sec.
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

...
[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +156, now: CPU 1, GPU 199 (MiB)
[06/17/2024-14:57:28] [I] Setting persistentCacheLimit to 0 bytes.
[06/17/2024-14:57:28] [I] Created execution context with device memory size: 155.537 MiB
[06/17/2024-14:57:28] [I] Using random values for input images
[06/17/2024-14:57:28] [I] Input binding for images with dimensions 1x500x1000x3 is created.
[06/17/2024-14:57:28] [I] Output binding for class_heatmaps with dimensions 1x5x125x250 is created.
[06/17/2024-14:57:28] [I] Starting inference
[06/17/2024-14:57:28] [F] [TRT] Validation failed: mBnScales != nullptr && mBnScales->mPtr != nullptr
plugin/groupNormalizationPlugin/groupNormalizationPlugin.cpp:132

[06/17/2024-14:57:28] [E] [TRT] std::exception
[06/17/2024-14:57:28] [E] Error[2]: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion pluginUtils::isSuccess(status) failed. )
[06/17/2024-14:57:28] [E] Error occurred during inference

Commands or scripts:
trtexec --onnx=model.onnx

Have you tried the latest release?: yes

appearancefnp avatar Jun 18 '24 07:06 appearancefnp