onnx-tensorrt Do I have to call initLibNvInferPlugins() even using built-in plugins only?

Description

Hi maintainers,

I'm working on a project based on TensorRT named Forward, especially for the ONNX part. It's about doing inference based on the TensorRT.

Currently, my OnnxEngine does everything well when deployed in C++, parsing from .onnx files, creating the engine, doing the inference, and getting the results.

However, when I use pybind11 to encapsulate my C++ code and deploy it in Python, everything still goes well except for some issues in the engine destruction step which causes core dumps. What I mean by "goes well" is that it can parse files, create engines, do inference, and get the results, all successfully.

These are some messages from gdb: 0x00007fffe052dba4 in nvinfer1::plugin::InstanceNormalizationPluginCreator::~InstanceNormalizationPluginCreator() () from /usr/lib64/libnvinfer_plugin.so.7.

Just recently, I found there was a function named initLibNvInferPlugins() and I'm suspecting if my problem is caused by not calling this function before calling parseFromFile?

Based on the fact that my OnnxEngine (and models) only depends on all the built-in plugins TensorRT provides, I speculate that initLibNvInferPlugins() should be only called if you are trying to add user-defined plugins. Otherwise, I should have failed all my unit tests for the OnnxEngine.

I didn't get too much clearly discussion from Google regarding the usage of initLibNvInferPlugins(), and I hope if I can get any insights here. :)

Hope my understanding is correct and hope it is caused by some configurations in the CMakeLists.txt (If it is, I will get the right direction for this issue at least...).

Environment

TensorRT Version: 7.2.1 GPU Type: T4-8C Nvidia Driver Version: 450.102.04 CUDA Version: 10.2.89 CUDNN Version: 8.0.2 Operating System: Linux Python Version: 3.6.8 Pybind11: v2.3.dev0

Relevant Files

onnx_engine.cpp

#include <NvOnnxParser.h>

// ...

std::shared_ptr<OnnxEngine> OnnxBuilder::Build(const std::string& model_path) {
  if (mode_ == InferMode::INVALID) {
    LOG(ERROR) << "Unsupported inference mode.";
    return nullptr;
  }

  builder_->SetInferMode(mode_);

  size_t max_workspace_size = reinterpret_cast<TrtForwardBuilder*>(builder_)->GetMaxWorkspaceSize();
  reinterpret_cast<TrtForwardBuilder*>(builder_)->SetMaxWorkspaceSize(
      TrtCommon::ResetMaxWorkspaceSize(max_workspace_size));

  TrtCommon::InferUniquePtr<nvinfer1::IBuilder> builder(
      nvinfer1::createInferBuilder(gLogger.getTRTLogger()));

  if (!builder) {
    LOG(ERROR) << "Create builder failed.";
    return nullptr;
  }

  const TrtCommon::InferUniquePtr<nvinfer1::INetworkDefinition> network(builder->createNetworkV2(
      1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH)));

  if (!network) {
    LOG(ERROR) << "Create network failed.";
    return nullptr;
  }

  if (!ParseModelFromFile(model_path, network.get())) {
    LOG(ERROR) << "Parse model failed.";
    return nullptr;
  }

  auto engine = BuildEngine(builder.get(), network.get());

  if (!engine) {
    LOG(ERROR) << "BuildEngine failed.";
    return nullptr;
  }

  auto trt_fwd_engine = std::make_shared<TrtForwardEngine>();

  // By default, an initiated engine will be returned.
  const auto meta_data = reinterpret_cast<TrtForwardBuilder*>(builder_)->GetEngineMetaData();
  if (!trt_fwd_engine->Clone(engine, meta_data) || !trt_fwd_engine->InitEngine()) {
    LOG(ERROR) << "Init Engine failed.";
    return nullptr;
  }

  return std::make_shared<OnnxEngine>(trt_fwd_engine);
}

bool OnnxBuilder::ParseModelFromFile(const std::string& model_path,
                                     nvinfer1::INetworkDefinition* network) const {
  auto parser = nvonnxparser::createParser(*network, gLogger.getTRTLogger());

  if (!parser) {
    LOG(ERROR) << "Create parser failed.";
    return false;
  }

  parser->parseFromFile(model_path.c_str(), static_cast<int>(nvinfer1::ILogger::Severity::kINFO));

  if (parser->getNbErrors() > 0) {
    for (int i = 0; i < parser->getNbErrors(); ++i) {
      LOG(ERROR) << parser->getError(i)->desc();
    }
    return false;
  }

  return true;
}

nvinfer1::ICudaEngine* OnnxBuilder::BuildEngine(nvinfer1::IBuilder* builder,
                                                nvinfer1::INetworkDefinition* network) {
  if (!SetInputType(network)) {
    LOG(ERROR) << "Set input data type failed.";
    return nullptr;
  }

  if (GetMaxBatchSize() < 0) {
    int dim0 = network->getInput(0)->getDimensions().d[0];
    int batch_size = dim0 == -1 ? 1 : dim0;
    SetMaxBatchSize(batch_size);
  }

  if (GetOptBatchSize() < 0) {
    SetOptBatchSize(GetMaxBatchSize());
  }

  auto dumped = reinterpret_cast<TrtForwardBuilder*>(builder_)->DumpNetwork(network);

  if (!dumped) {
    LOG(ERROR) << "Dump network failed.";
    return nullptr;
  }

  auto engine = reinterpret_cast<TrtForwardBuilder*>(builder_)->BuildEngine(builder, network);

  if (!engine) {
    LOG(ERROR) << "Build engine failed.";
    return nullptr;
  }

  SetOutputPositions(TrtCommon::GetOutputOrder(engine, network));

  return engine;
}

Good: test_fwd_onnx.cpp

Not good: test_forward_onnx.py (able to get outputs, then core dumps)

Sep 01 '21 15:09 zhaoyiluo

The ONNX parser calls initLibNvInferPlugins() internally. This function registers the builtin TensorRT plugins.

This error may be coming from a memory leak in the InstanceNormalization plugin itself. Which version of the plugin library are you using?

It may be worthwhile upgrading to TensorRT 8 or building the open source plugins (following the instructions at https://github.com/NVIDIA/TensorRT), as they leak may have been fixed in a later release.

Sep 13 '21 19:09 kevinch-nv

The ONNX parser calls initLibNvInferPlugins() internally. This function registers the builtin TensorRT plugins.

This error may be coming from a memory leak in the InstanceNormalization plugin itself. Which version of the plugin library are you using?

It may be worthwhile upgrading to TensorRT 8 or building the open source plugins (following the instructions at https://github.com/NVIDIA/TensorRT), as they leak may have been fixed in a later release.

Hi @kevinch-nv ,

I'm using TensorRT-7.2.1.6 and it's the version suggested by our team for compatibility. As I directly use the ONNX parser through parseFromFile from the NvOnnxParser.h header file, I speculate 7.2.1.6 is the version you are asking for. (plz correct me if this's info is not what you are looking for)

Besides, more strange behaviors occur.

The above test_fwd_onnx.cpp example was built after I generated the shared library libfwd_onnx.so (used CMake) first, then also used CMake to generate test_fwd_onnx executable. When I moved this shared library to another workspace with the same test_fwd_onnx.cpp example, but built via Bazel, the same core dump issue happened again. Below are the information from the core file.

(gdb) bt
#0  0x00007f0a306ceba4 in nvinfer1::plugin::InstanceNormalizationPluginCreator::~InstanceNormalizationPluginCreator() () from /usr/lib64/libnvinfer_plugin.so.7
#1  0x00007f0a306b2131 in nvinfer1::plugin::PluginCreatorRegistry::~PluginCreatorRegistry() () from /usr/lib64/libnvinfer_plugin.so.7
#2  0x00007f0a4430606c in __run_exit_handlers () from /usr/lib64/libc.so.6
#3  0x00007f0a443061a0 in exit () from /usr/lib64/libc.so.6
#4  0x00007f0a442ef87a in __libc_start_main () from /usr/lib64/libc.so.6
#5  0x0000000000401f2e in _start ()

We excluded the possibilities caused by the different compilation flags between CMake and Bazel. We also figured out that to avoid the core dump issue, this test_fwd_onnx.cpp file had to be dependent on libnvonnxparser.so separately, though our libfwd_onnx.so had already linked to TensorRT which also included libnvonnxparser.so.

Here are the Bazel rules.

# forward.BUILD
cc_import(
    visibility = ["//visibility:public"],
    name = "forward_onnx",
    hdrs = glob(["include/*.h"]),
    shared_library = "libs/libfwd_onnx.so",
)

# WORKSPACE
new_local_repository(
  name = "trt_local",
  path = "/usr/lib64/",
  build_file_content = """
cc_import(
  name = "onnxparser",
  shared_library = "libnvonnxparser.so",
  visibility = ["//visibility:public"],
)
""",
)

new_local_repository(
  name = "forward_local",
  path = "forward_libs",
  build_file = "forward_libs/forward.BUILD",
)

# BUILD
cc_binary(
  name = "onnx_utest",
  copts = ["-g"],
  srcs = ["forward_libs/test_fwd_onnx.cpp"],
  linkopts = LINKOPTS,
  deps = [
    "@trt_local//:onnxparser",  # has to be the first dependency
    "@forward_local//:forward_onnx",  # has to be second dependency
  ]
)

Linkage for libfwd_onnx.so

[root@e14473b93409 ForwardServable]$ ldd forward_libs/libs/libfwd_onnx.so
        ...
        libtrt_engine.so => /data/Project/Forward/build/bin/libtrt_engine.so (0x00007f0ee8384000)
        libnvinfer.so.7 => /usr/lib64/libnvinfer.so.7 (0x00007f0ed706f000)
        libnvinfer_plugin.so.7 => /usr/lib64/libnvinfer_plugin.so.7 (0x00007f0ed6425000)
        libnvonnxparser.so.7 => /usr/lib64/libnvonnxparser.so.7 (0x00007f0ed5fa3000)  # here
        libnvparsers.so.7 => /usr/lib64/libnvparsers.so.7 (0x00007f0ed5a6b000)
        ...

Sep 14 '21 02:09 zhaoyiluo

onnx-tensorrt onnx-tensorrt copied to clipboard

Do I have to call initLibNvInferPlugins() even using built-in plugins only?

Description

Environment

Relevant Files

onnx-tensorrt
onnx-tensorrt copied to clipboard