TensorRT
                                
                                 TensorRT copied to clipboard
                                
                                    TensorRT copied to clipboard
                            
                            
                            
                        Segmentation fault for TensorRT 8.6 when loading ONNX model via C++ API on GPU V100
Description
I tried to use the C++ API to load the attached ONNX model but it fails with a segmentation fault (core dumped). Note: possibly related to https://github.com/NVIDIA/TensorRT/issues/3630, this is the same model but with fixed batch size of 1.
Environment
TensorRT Version: 8.6.1.6
NVIDIA GPU: V100
NVIDIA Driver Version: 545.23.08
CUDA Version: 12.1
CUDNN Version: 8.9.0.131-1+cuda12.1
Operating System: Ubuntu 20.04
Python Version (if applicable): N/A
Tensorflow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if so, version): N/A
Relevant Files
Model link: https://drive.google.com/file/d/1uoy0EcJj8BKq1F_Fd8HuYofye9KkPBHu/view?usp=sharing
Output Log: trtsegfault.txt
Steps To Reproduce
Use this C++ code which follows the sample for loading an ONNX model:
#include "NvInfer.h"
#include "NvInferPlugin.h"
#include "NvOnnxConfig.h"
#include "NvOnnxParser.h"
class TestLogger : public nvinfer1::ILogger {
 public:
  void log(Severity severity,
           nvinfer1::AsciiChar const* msg) noexcept override {
    std::cout << msg << std::endl;
  }
};
TestLogger logger;
nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(logger);
const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
nvinfer1::INetworkDefinition* network = builder->createNetworkV2(explicitBatch);
nvinfer1::IBuilderConfig* config = builder->createBuilderConfig();
nvonnxparser::IParser* parser = nvonnxparser::createParser(*network, logger);
auto model = "trtcppapi_segfault.onnx";
auto parsed = parser->parseFromFile(model, 0);
cudaStream_t profile_stream = 0;
cudaStreamCreate(&profile_stream);
config->setProfileStream(profile_stream);
nvinfer1::IHostMemory* plan = builder->buildSerializedNetwork(*network, *config);
nvinfer1::IRuntime* mRuntime = nvinfer1::createInferRuntime(logger);
nvinfer1::ICudaEngine* mEngine = mRuntime->deserializeCudaEngine(plan->data(), plan->size());
nvinfer1::IExecutionContext* context = mEngine->createExecutionContext();
Have you tried the latest release?: Yes
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes it can be run with polygraphy 0.49.0 on the same environment:
polygraphy run trtcppapi_segfault.onnx --onnxrt
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] RUNNING | Command: polygraphy run trtcppapi_segfault.onnx --onnxrt
[I] onnxrt-runner-N0-01/24/24-12:37:23  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-01/24/24-12:37:23 
    ---- Inference Input(s) ----
    {img [dtype=float32, shape=(1, 3, 720, 1280)],
     seg [dtype=float32, shape=(1, 1, 720, 1280)]}
[I] onnxrt-runner-N0-01/24/24-12:37:23 
    ---- Inference Output(s) ----
    {mask [dtype=float32, shape=(1, 1, 720, 1280)]}
[I] onnxrt-runner-N0-01/24/24-12:37:23  | Completed 1 iteration(s) in 5096 ms | Average inference time: 5096 ms.
[I] PASSED | Runtime: 6.203s | Command: polygraphy run trtcppapi_segfault.onnx --onnxrt
@zerollzeng I encountered this issue as well with same trt version. When compiling & running sampleONNXMNIST sample, same problem appeared, but with a more clear error message:
&&&& RUNNING TensorRT.sample_onnx_mnist [TensorRT v8601] # ./sample_onnx_mnist
[02/29/2024-11:04:54] [I] Building and running a GPU inference engine for Onnx MNIST
[02/29/2024-11:04:54] [I] [TRT] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 19, GPU 252 (MiB)
[02/29/2024-11:04:54] [E] [TRT] 6: [libLoader.cpp::Impl::293] Error Code 6: Internal Error (Unable to load library: libnvinfer_builder_resource.so.8.6.1: libnvinfer_builder_resource.so.8.6.1: cannot open shared object file: No such file or directory)
&&&& FAILED TensorRT.sample_onnx_mnist [TensorRT v8601] # ./sample_onnx_mnist
I believe it's related to this issue. Patching libnvinfer rpath solved my issue: https://github.com/NVIDIA/TensorRT/issues/2218