TensorRT-LLM getPluginCreator could not find plugin: Gemmtensorrt

System Info

tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.10.0.dev2024050700

Who can help?

@byshiue

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

build with following script, it could build success

set -e

export MODEL_DIR=/mnt/memory
export MODEL_NAME=Mixtral-8x7B-Instruct-v0.1
export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/tensorrt/bin:$PATH
export PRECISION=fp16
export DTYPE=bfloat16
export TP_SIZE=4


python ../llama/convert_checkpoint.py \
    --model_dir $MODEL_DIR/${MODEL_NAME} \
    --output_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
    --dtype $DTYPE \
    --tp_size $TP_SIZE

trtllm-build \
    --checkpoint_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
    --output_dir $MODEL_DIR/tmp/trt_engines/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
    --gemm_plugin $DTYPE \
    --gpt_attention_plugin $DTYPE \
    --use_fused_mlp \
    --max_batch_size 1 \
    --max_input_len 2048 \
    --max_output_len 1024

load the engine

import tensorrt as trt

# Initialize TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.INFO)

# Function to load TensorRT engine
def load_engine(engine_path):
    with open(engine_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

rank = 0
# Determine the engine file based on the rank
engine_path = f'/mnt/memory/tmp/trt_engines/Mixtral-8x7B-Instruct-v0.1/fp16/4-gpu-tp4/rank{rank}.engine'
print(f"Process {rank} loading engine from {engine_path}")
load_engine(engine_path)

get error:

Process 0 loading engine from /mnt/memory/tmp/trt_engines/Mixtral-8x7B-Instruct-v0.1/fp16/4-gpu-tp4/rank0.engine
[05/13/2024-02:59:35] [TRT] [I] Loaded engine size: 22480 MiB
[05/13/2024-02:59:37] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/13/2024-02:59:37] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/13/2024-02:59:37] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)

Expected behavior

The engine could load success

actual behavior

fail to load engine

additional notes

plugin dir

  llama git:(trtllm-build2) ✗ ll /app/tensorrt-llm/cpp/build/tensorrt_llm/plugins           
total 447M
drwxr-xr-x 3 root root  106 May 10 11:17 CMakeFiles
-rw-r--r-- 1 root root  52K May 10 11:17 Makefile
drwxr-xr-x 3 root root   67 May 10 11:17 bertAttentionPlugin
-rw-r--r-- 1 root root 5.3K May 10 11:17 cmake_install.cmake
drwxr-xr-x 3 root root   67 May 10 11:17 common
drwxr-xr-x 3 root root   67 May 10 11:17 cumsumLastDimPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 gemmPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 gptAttentionCommon
drwxr-xr-x 3 root root   67 May 10 11:17 gptAttentionPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 identityPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 layernormQuantizationPlugin
lrwxrwxrwx 1 root root   36 May 10 13:27 libnvinfer_plugin_tensorrt_llm.so -> libnvinfer_plugin_tensorrt_llm.so.10
lrwxrwxrwx 1 root root   40 May 10 13:27 libnvinfer_plugin_tensorrt_llm.so.10 -> libnvinfer_plugin_tensorrt_llm.so.10.0.1
-rwxr-xr-x 1 root root 447M May 10 13:27 libnvinfer_plugin_tensorrt_llm.so.10.0.1
drwxr-xr-x 3 root root   67 May 10 11:17 lookupPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 loraPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 lruPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 mambaConv1dPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 mixtureOfExperts
drwxr-xr-x 3 root root   67 May 10 11:17 ncclPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 quantizePerTokenPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 quantizeTensorPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 rmsnormQuantizationPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 selectiveScanPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 smoothQuantGemmPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 weightOnlyGroupwiseQuantMatmulPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 weightOnlyQuantMatmulPlugin

list plugins with script

import tensorrt as trt

# Initialize the TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

def list_plugins():
    plugin_registry = trt.get_plugin_registry()
    if plugin_registry is None:
        print("No plugin registry found.")
        return

    plugin_creators = plugin_registry.plugin_creator_list
    num_plugins = len(plugin_creators)
    print(f"Number of registered plugins: {num_plugins}")

    for i, plugin_creator in enumerate(plugin_creators):
        print(f"Plugin {i + 1}, Name: {plugin_creator.name}, Version: {plugin_creator.plugin_version}")

if __name__ == "__main__":
    list_plugins()

and get following plugin list

Number of registered plugins: 30
Plugin 1, Name: CaskDeconvV2RunnerWeightsTransformerPlugin, Version: 1
Plugin 2, Name: CaskDeconvV1RunnerWeightsTransformerPlugin, Version: 1
Plugin 3, Name: CaskConvolutionRunnerWeightsTransformerPlugin, Version: 1
Plugin 4, Name: CaskFlattenConvolutionRunnerWeightsTransformerPlugin, Version: 1
Plugin 5, Name: CaskConvActPoolWeightsTransformerPlugin, Version: 1
Plugin 6, Name: CaskDepSepConvWeightsTransformerPlugin, Version: 1
Plugin 7, Name: MyelinWeightsTransformPlugin, Version: 1
Plugin 8, Name: DisentangledAttention_TRT, Version: 1
Plugin 9, Name: CustomEmbLayerNormPluginDynamic, Version: 1
Plugin 10, Name: CustomEmbLayerNormPluginDynamic, Version: 2
Plugin 11, Name: CustomEmbLayerNormPluginDynamic, Version: 3
Plugin 12, Name: CustomFCPluginDynamic, Version: 1
Plugin 13, Name: CustomGeluPluginDynamic, Version: 1
Plugin 14, Name: GroupNormalizationPlugin, Version: 1
Plugin 15, Name: CustomSkipLayerNormPluginDynamic, Version: 3
Plugin 16, Name: CustomSkipLayerNormPluginDynamic, Version: 4
Plugin 17, Name: CustomSkipLayerNormPluginDynamic, Version: 1
Plugin 18, Name: CustomSkipLayerNormPluginDynamic, Version: 2
Plugin 19, Name: RnRes2Br1Br2c_TRT, Version: 1
Plugin 20, Name: RnRes2Br1Br2c_TRT, Version: 2
Plugin 21, Name: RnRes2Br2bBr2c_TRT, Version: 1
Plugin 22, Name: RnRes2Br2bBr2c_TRT, Version: 2
Plugin 23, Name: RnRes2FullFusion_TRT, Version: 1
Plugin 24, Name: SingleStepLSTMPlugin, Version: 1
Plugin 25, Name: CustomQKVToContextPluginDynamic, Version: 3
Plugin 26, Name: CustomQKVToContextPluginDynamic, Version: 1
Plugin 27, Name: CustomQKVToContextPluginDynamic, Version: 2
Plugin 28, Name: DLRM_BOTTOM_MLP_TRT, Version: 1
Plugin 29, Name: SmallTileGEMM_TRT, Version: 1
Plugin 30, Name: RNNTEncoderPlugin, Version: 1

May 13 '24 03:05 gloritygithub11

It is caused by the mismatch of TRT version. Have you rebuild the docker image when you upgrade the TensorRT-LLM to 0.10.0? Since TensorRT-LLM 0.10.0 uses TensorRT 10 while older TensorRT-LLM uses TensorRT 9.

May 15 '24 03:05 byshiue

@byshiue yes, I've rebuild the docker image. You can see in my above list, the tensorrt already 0.10.1: libnvinfer_plugin_tensorrt_llm.so.10.0.1

ll /usr/local/tensorrt/lib/
total 3.5G
lrwxrwxrwx 1 root root   20 Apr 15 23:25 libnvinfer.so -> libnvinfer.so.10.0.1
lrwxrwxrwx 1 root root   20 Apr 15 23:25 libnvinfer.so.10 -> libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 224M Apr 15 23:25 libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 1.3G Apr 15 23:26 libnvinfer_builder_resource.so.10.0.1
lrwxrwxrwx 1 root root   29 Apr 15 23:22 libnvinfer_dispatch.so -> libnvinfer_dispatch.so.10.0.1
lrwxrwxrwx 1 root root   29 Apr 15 23:22 libnvinfer_dispatch.so.10 -> libnvinfer_dispatch.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:22 libnvinfer_dispatch.so.10.0.1
-rw-r--r-- 1 root root 751K Apr 15 23:22 libnvinfer_dispatch_static.a
lrwxrwxrwx 1 root root   25 Apr 15 23:22 libnvinfer_lean.so -> libnvinfer_lean.so.10.0.1
lrwxrwxrwx 1 root root   25 Apr 15 23:22 libnvinfer_lean.so.10 -> libnvinfer_lean.so.10.0.1
-rwxr-xr-x 1 root root  33M Apr 15 23:22 libnvinfer_lean.so.10.0.1
-rw-r--r-- 1 root root 243M Apr 15 23:22 libnvinfer_lean_static.a
lrwxrwxrwx 1 root root   27 Apr 15 23:26 libnvinfer_plugin.so -> libnvinfer_plugin.so.10.0.1
lrwxrwxrwx 1 root root   27 Apr 15 23:26 libnvinfer_plugin.so.10 -> libnvinfer_plugin.so.10.0.1
-rwxr-xr-x 1 root root  33M Apr 15 23:26 libnvinfer_plugin.so.10.0.1
-rw-r--r-- 1 root root  37M Apr 15 23:26 libnvinfer_plugin_static.a
-rw-r--r-- 1 root root 1.7G Apr 15 23:26 libnvinfer_static.a
lrwxrwxrwx 1 root root   30 Apr 15 23:26 libnvinfer_vc_plugin.so -> libnvinfer_vc_plugin.so.10.0.1
lrwxrwxrwx 1 root root   30 Apr 15 23:26 libnvinfer_vc_plugin.so.10 -> libnvinfer_vc_plugin.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:26 libnvinfer_vc_plugin.so.10.0.1
-rw-r--r-- 1 root root 442K Apr 15 23:26 libnvinfer_vc_plugin_static.a
lrwxrwxrwx 1 root root   21 Apr 15 23:26 libnvonnxparser.so -> libnvonnxparser.so.10
lrwxrwxrwx 1 root root   25 Apr 15 23:26 libnvonnxparser.so.10 -> libnvonnxparser.so.10.0.1
-rwxr-xr-x 1 root root 3.4M Apr 15 23:22 libnvonnxparser.so.10.0.1
-rw-r--r-- 1 root root  19M Apr 15 23:22 libnvonnxparser_static.a
-rw-r--r-- 1 root root 675K Apr 15 23:26 libonnx_proto.a
drwxr-xr-x 2 root root  168 Apr 15 23:26 stubs

May 15 '24 04:05 gloritygithub11

@byshiue yes, I've rebuild the docker image. You can see in my above list, the tensorrt already 0.10.1: libnvinfer_plugin_tensorrt_llm.so.10.0.1

ll /usr/local/tensorrt/lib/
total 3.5G
lrwxrwxrwx 1 root root   20 Apr 15 23:25 libnvinfer.so -> libnvinfer.so.10.0.1
lrwxrwxrwx 1 root root   20 Apr 15 23:25 libnvinfer.so.10 -> libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 224M Apr 15 23:25 libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 1.3G Apr 15 23:26 libnvinfer_builder_resource.so.10.0.1
lrwxrwxrwx 1 root root   29 Apr 15 23:22 libnvinfer_dispatch.so -> libnvinfer_dispatch.so.10.0.1
lrwxrwxrwx 1 root root   29 Apr 15 23:22 libnvinfer_dispatch.so.10 -> libnvinfer_dispatch.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:22 libnvinfer_dispatch.so.10.0.1
-rw-r--r-- 1 root root 751K Apr 15 23:22 libnvinfer_dispatch_static.a
lrwxrwxrwx 1 root root   25 Apr 15 23:22 libnvinfer_lean.so -> libnvinfer_lean.so.10.0.1
lrwxrwxrwx 1 root root   25 Apr 15 23:22 libnvinfer_lean.so.10 -> libnvinfer_lean.so.10.0.1
-rwxr-xr-x 1 root root  33M Apr 15 23:22 libnvinfer_lean.so.10.0.1
-rw-r--r-- 1 root root 243M Apr 15 23:22 libnvinfer_lean_static.a
lrwxrwxrwx 1 root root   27 Apr 15 23:26 libnvinfer_plugin.so -> libnvinfer_plugin.so.10.0.1
lrwxrwxrwx 1 root root   27 Apr 15 23:26 libnvinfer_plugin.so.10 -> libnvinfer_plugin.so.10.0.1
-rwxr-xr-x 1 root root  33M Apr 15 23:26 libnvinfer_plugin.so.10.0.1
-rw-r--r-- 1 root root  37M Apr 15 23:26 libnvinfer_plugin_static.a
-rw-r--r-- 1 root root 1.7G Apr 15 23:26 libnvinfer_static.a
lrwxrwxrwx 1 root root   30 Apr 15 23:26 libnvinfer_vc_plugin.so -> libnvinfer_vc_plugin.so.10.0.1
lrwxrwxrwx 1 root root   30 Apr 15 23:26 libnvinfer_vc_plugin.so.10 -> libnvinfer_vc_plugin.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:26 libnvinfer_vc_plugin.so.10.0.1
-rw-r--r-- 1 root root 442K Apr 15 23:26 libnvinfer_vc_plugin_static.a
lrwxrwxrwx 1 root root   21 Apr 15 23:26 libnvonnxparser.so -> libnvonnxparser.so.10
lrwxrwxrwx 1 root root   25 Apr 15 23:26 libnvonnxparser.so.10 -> libnvonnxparser.so.10.0.1
-rwxr-xr-x 1 root root 3.4M Apr 15 23:22 libnvonnxparser.so.10.0.1
-rw-r--r-- 1 root root  19M Apr 15 23:22 libnvonnxparser_static.a
-rw-r--r-- 1 root root 675K Apr 15 23:26 libonnx_proto.a
drwxr-xr-x 2 root root  168 Apr 15 23:26 stubs

How do you build the docker image and the tensorrt_llm?

May 17 '24 07:05 byshiue

with following docker file

# Use an official NVIDIA CUDA image as a parent image
FROM nvidia/cuda:12.4.1-devel-ubuntu20.04

# Set the working directory
WORKDIR /app


# Install software-properties-common to add repositories
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y software-properties-common

# Add deadsnakes PPA for newer Python versions
RUN add-apt-repository ppa:deadsnakes/ppa

# Install necessary packages
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    python3.10 \
    python3.10-distutils \
    python3-pip \
    openmpi-bin \
    libopenmpi-dev \
    git \
    && rm -rf /var/lib/apt/lists/*

RUN apt-get update \
  && apt-get install python3.10-venv \
  && python3.10 -m venv venv_dev

RUN apt-get update \
  && apt-get install -y python3.10-dev

RUN . venv_dev/bin/activate \
  && python3 -m pip install -U pip \
  && pip3 install tensorrt_llm --pre --extra-index-url https://pypi.nvidia.com --timeout 3600

RUN apt-get install wget \
  && wget https://github.com/Kitware/CMake/releases/download/v3.29.2/cmake-3.29.2-linux-x86_64.sh\
  && chmod +x cmake-3.29.2-linux-x86_64.sh\
  && ./cmake-3.29.2-linux-x86_64.sh --skip-license --prefix=/usr/local


RUN git clone https://github.com/NVIDIA/TensorRT-LLM.git tensorrt-llm \
  && cd tensorrt-llm \
  && ENV=/root/.bashrc bash docker/common/install_tensorrt.sh

RUN apt-get install -y vim git-lfs

RUN export PYTHONPATH=/app/tensorrt-llm/3rdparty/cutlass/python:$PYTHONPATH \
  && . /app/venv_dev/bin/activate \
  && cd tensorrt-llm \
  && git lfs install \
  && git lfs pull \
  && python scripts/build_wheel.py -c -D"TRT_INCLUDE_DIR=/usr/local/tensorrt/include" -D"TRT_LIB_DIR=/usr/local/tensorrt/lib"

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["bash", "echo Hello World!"]

May 17 '24 08:05 gloritygithub11

It seems you don't use the official docker file. Could you take a try?

May 23 '24 08:05 byshiue

@byshiue

I follow the steps in https://nvidia.github.io/TensorRT-LLM/installation/linux.html to create a new docker env.

Get similar error.

Process 0 loading engine from /root/models/tmp/trt_engines/Meta-Llama-3-8B-Instruct/fp16/1-gpu-tp1/rank0.engine
[05/24/2024-08:20:11] [TRT] [I] Loaded engine size: 15323 MiB
[05/24/2024-08:20:13] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/24/2024-08:20:13] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/24/2024-08:20:13] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)

Following is the modules related to tensorrt

root@e8fbc031fb35:~/TensorRT-LLM/examples/llama# pip list | grep tensorrt
tensorrt                 10.0.1
tensorrt-cu12            10.0.1
tensorrt-cu12-bindings   10.0.1
tensorrt-cu12-libs       10.0.1
tensorrt-llm             0.11.0.dev2024052100

PS, I didn't find tensorrt under /usr/local/tensorrt/lib, does it located in somewhere or need additional steps?

May 24 '24 08:05 gloritygithub11

Could you take a try following the guide here https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation/build-from-source-linux.md#option-1-build-tensorrt-llm-in-one-step?

May 27 '24 08:05 byshiue

I build a new docker image with:

make release_build CUDA_ARCHS="80-real"

the image could build and I can use this image to convert & build with following command:

python ../llama/convert_checkpoint.py --model_dir /mnt/memory/Meta-Llama-3-8B-Instruct --output_dir /mnt/memory/tmp/trt_models/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp --dtype float16 --use_weight_only --weight_only_precision int4 --load_model_on_cpu

trtllm-build \
    --checkpoint_dir /mnt/memory/tmp/trt_models/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp \
    --output_dir /mnt/memory/tmp/trt_engines/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp \
    --gemm_plugin float16 \
    --gpt_attention_plugin float16 \
    --max_batch_size 1 \
    --max_input_len 2048 \
    --max_output_len 1024

test load

import tensorrt as trt

# Initialize TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.INFO)

# Function to load TensorRT engine
def load_engine(engine_path):
    with open(engine_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

rank = 0
# Determine the engine file based on the rank
engine_path = f'/mnt/memory/tmp/trt_engines/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp/rank0.engine'

load_engine(engine_path)

get error:

[05/31/2024-00:22:08] [TRT] [I] Loaded engine size: 5342 MiB
[05/31/2024-00:22:09] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[05/31/2024-00:22:09] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[05/31/2024-00:22:09] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)

May 31 '24 00:05 gloritygithub11

Could you share the trt version log of python and c side by

$ pip list | grep tensorrt
tensorrt                  10.0.1
tensorrt-llm              0.11.0.dev2024052800
torch-tensorrt            2.3.0a0

$ cat /usr/local/tensorrt/include/NvInferVersion.h  | grep version
//! Defines the TensorRT version
#define NV_TENSORRT_MAJOR 10 //!< TensorRT major version.
#define NV_TENSORRT_MINOR 0 //!< TensorRT minor version.
#define NV_TENSORRT_PATCH 1 //!< TensorRT patch version.
#define NV_TENSORRT_LWS_MAJOR 0 //!< TensorRT LWS major version.
#define NV_TENSORRT_LWS_MINOR 0 //!< TensorRT LWS minor version.
#define NV_TENSORRT_LWS_PATCH 0 //!< TensorRT LWS patch version.

Jun 06 '24 06:06 byshiue

$ cat /usr/local/tensorrt/include/NvInferVersion.h  | grep version
//! Defines the TensorRT version
#define NV_TENSORRT_MAJOR 10 //!< TensorRT major version.
#define NV_TENSORRT_MINOR 0 //!< TensorRT minor version.
#define NV_TENSORRT_PATCH 1 //!< TensorRT patch version.
#define NV_TENSORRT_LWS_MAJOR 0 //!< TensorRT LWS major version.
#define NV_TENSORRT_LWS_MINOR 0 //!< TensorRT LWS minor version.
#define NV_TENSORRT_LWS_PATCH 0 //!< TensorRT LWS patch version.
$ pip list | grep tensorrt
tensorrt                  10.0.1
tensorrt-llm              0.11.0.dev2024052800
torch-tensorrt            2.3.0a0

Jun 06 '24 08:06 gloritygithub11

Could you add your trt_llm root folder into PYTHONPATH environment variable and try again?

Jun 07 '24 09:06 byshiue

do you mean /app/tensorrt_llm/, looks there's no python related content in the folder

ll /app/tensorrt_llm/
total 12
drwxr-xr-x 1 root root   40 May 29 13:01 ./
drwxr-xr-x 1 root root   26 May 29 12:19 ../
-rw-rw-r-- 1 root root 5412 May 29 02:20 README.md
drwxr-xr-x 1 root root   17 Apr 12 08:53 benchmarks/
drwxr-xr-x 3 root root  108 Apr  9 06:19 docs/
drwxrwxrwx 1 root root 4096 May 29 12:05 examples/
drwxr-xr-x 3 root root   26 Apr  9 06:19 include/
lrwxrwxrwx 1 root root   57 May 29 13:01 lib -> /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/

I checkout the source code into /app/tensorrt-llm-src, and try to load, get same error

root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:07:58] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:02] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt_llm
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:08:35] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:35] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:08:59] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:00] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src/tensorrt_llm
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:09:17] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:17] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)

Jun 07 '24 11:06 gloritygithub11

Did you resolve this?

Jun 10 '24 06:06 dywsjtu

I mean setting PYTHONPATH=tensorrt_llm_backend/tensorrt_llm after building the tensorrt_llm in the docker image.

Jun 12 '24 03:06 byshiue

@byshiue what do you mean for tensorrt_llm_backend?

Jun 13 '24 07:06 gloritygithub11

tensorrt_llm_backend means the root path of the repo you clone from https://github.com/triton-inference-server/tensorrtllm_backend

Jun 14 '24 03:06 byshiue

do you mean /app/tensorrt_llm/, looks there's no python related content in the folder

ll /app/tensorrt_llm/
total 12
drwxr-xr-x 1 root root   40 May 29 13:01 ./
drwxr-xr-x 1 root root   26 May 29 12:19 ../
-rw-rw-r-- 1 root root 5412 May 29 02:20 README.md
drwxr-xr-x 1 root root   17 Apr 12 08:53 benchmarks/
drwxr-xr-x 3 root root  108 Apr  9 06:19 docs/
drwxrwxrwx 1 root root 4096 May 29 12:05 examples/
drwxr-xr-x 3 root root   26 Apr  9 06:19 include/
lrwxrwxrwx 1 root root   57 May 29 13:01 lib -> /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/

I checkout the source code into /app/tensorrt-llm-src, and try to load, get same error

root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:07:58] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:02] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt_llm
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:08:35] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:35] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:08:59] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:00] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src/tensorrt_llm
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:09:17] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:17] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)

@byshiue As my above reply, the /app/tensorrt_llm/ do not contains full repo content, and I checked out code into /app/tensorrt-llm-src, I tried both path, all get the same error.

Jun 14 '24 05:06 gloritygithub11

Could you try using the this docker image nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3 directly?

Jun 26 '24 01:06 byshiue

Hi @gloritygithub11 do u still have further issue or question now? If not, we'll close it soon.

Nov 14 '24 03:11 nv-guomingz