getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
System Info
tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.10.0.dev2024050700
Who can help?
@byshiue
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
build with following script, it could build success
set -e
export MODEL_DIR=/mnt/memory
export MODEL_NAME=Mixtral-8x7B-Instruct-v0.1
export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/tensorrt/bin:$PATH
export PRECISION=fp16
export DTYPE=bfloat16
export TP_SIZE=4
python ../llama/convert_checkpoint.py \
--model_dir $MODEL_DIR/${MODEL_NAME} \
--output_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
--dtype $DTYPE \
--tp_size $TP_SIZE
trtllm-build \
--checkpoint_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
--output_dir $MODEL_DIR/tmp/trt_engines/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
--gemm_plugin $DTYPE \
--gpt_attention_plugin $DTYPE \
--use_fused_mlp \
--max_batch_size 1 \
--max_input_len 2048 \
--max_output_len 1024
load the engine
import tensorrt as trt
# Initialize TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
# Function to load TensorRT engine
def load_engine(engine_path):
with open(engine_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())
rank = 0
# Determine the engine file based on the rank
engine_path = f'/mnt/memory/tmp/trt_engines/Mixtral-8x7B-Instruct-v0.1/fp16/4-gpu-tp4/rank{rank}.engine'
print(f"Process {rank} loading engine from {engine_path}")
load_engine(engine_path)
get error:
Process 0 loading engine from /mnt/memory/tmp/trt_engines/Mixtral-8x7B-Instruct-v0.1/fp16/4-gpu-tp4/rank0.engine
[05/13/2024-02:59:35] [TRT] [I] Loaded engine size: 22480 MiB
[05/13/2024-02:59:37] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/13/2024-02:59:37] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/13/2024-02:59:37] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Expected behavior
The engine could load success
actual behavior
fail to load engine
additional notes
plugin dir
llama git:(trtllm-build2) ✗ ll /app/tensorrt-llm/cpp/build/tensorrt_llm/plugins
total 447M
drwxr-xr-x 3 root root 106 May 10 11:17 CMakeFiles
-rw-r--r-- 1 root root 52K May 10 11:17 Makefile
drwxr-xr-x 3 root root 67 May 10 11:17 bertAttentionPlugin
-rw-r--r-- 1 root root 5.3K May 10 11:17 cmake_install.cmake
drwxr-xr-x 3 root root 67 May 10 11:17 common
drwxr-xr-x 3 root root 67 May 10 11:17 cumsumLastDimPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 gemmPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 gptAttentionCommon
drwxr-xr-x 3 root root 67 May 10 11:17 gptAttentionPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 identityPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 layernormQuantizationPlugin
lrwxrwxrwx 1 root root 36 May 10 13:27 libnvinfer_plugin_tensorrt_llm.so -> libnvinfer_plugin_tensorrt_llm.so.10
lrwxrwxrwx 1 root root 40 May 10 13:27 libnvinfer_plugin_tensorrt_llm.so.10 -> libnvinfer_plugin_tensorrt_llm.so.10.0.1
-rwxr-xr-x 1 root root 447M May 10 13:27 libnvinfer_plugin_tensorrt_llm.so.10.0.1
drwxr-xr-x 3 root root 67 May 10 11:17 lookupPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 loraPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 lruPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 mambaConv1dPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 mixtureOfExperts
drwxr-xr-x 3 root root 67 May 10 11:17 ncclPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 quantizePerTokenPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 quantizeTensorPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 rmsnormQuantizationPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 selectiveScanPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 smoothQuantGemmPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 weightOnlyGroupwiseQuantMatmulPlugin
drwxr-xr-x 3 root root 67 May 10 11:17 weightOnlyQuantMatmulPlugin
list plugins with script
import tensorrt as trt
# Initialize the TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
def list_plugins():
plugin_registry = trt.get_plugin_registry()
if plugin_registry is None:
print("No plugin registry found.")
return
plugin_creators = plugin_registry.plugin_creator_list
num_plugins = len(plugin_creators)
print(f"Number of registered plugins: {num_plugins}")
for i, plugin_creator in enumerate(plugin_creators):
print(f"Plugin {i + 1}, Name: {plugin_creator.name}, Version: {plugin_creator.plugin_version}")
if __name__ == "__main__":
list_plugins()
and get following plugin list
Number of registered plugins: 30
Plugin 1, Name: CaskDeconvV2RunnerWeightsTransformerPlugin, Version: 1
Plugin 2, Name: CaskDeconvV1RunnerWeightsTransformerPlugin, Version: 1
Plugin 3, Name: CaskConvolutionRunnerWeightsTransformerPlugin, Version: 1
Plugin 4, Name: CaskFlattenConvolutionRunnerWeightsTransformerPlugin, Version: 1
Plugin 5, Name: CaskConvActPoolWeightsTransformerPlugin, Version: 1
Plugin 6, Name: CaskDepSepConvWeightsTransformerPlugin, Version: 1
Plugin 7, Name: MyelinWeightsTransformPlugin, Version: 1
Plugin 8, Name: DisentangledAttention_TRT, Version: 1
Plugin 9, Name: CustomEmbLayerNormPluginDynamic, Version: 1
Plugin 10, Name: CustomEmbLayerNormPluginDynamic, Version: 2
Plugin 11, Name: CustomEmbLayerNormPluginDynamic, Version: 3
Plugin 12, Name: CustomFCPluginDynamic, Version: 1
Plugin 13, Name: CustomGeluPluginDynamic, Version: 1
Plugin 14, Name: GroupNormalizationPlugin, Version: 1
Plugin 15, Name: CustomSkipLayerNormPluginDynamic, Version: 3
Plugin 16, Name: CustomSkipLayerNormPluginDynamic, Version: 4
Plugin 17, Name: CustomSkipLayerNormPluginDynamic, Version: 1
Plugin 18, Name: CustomSkipLayerNormPluginDynamic, Version: 2
Plugin 19, Name: RnRes2Br1Br2c_TRT, Version: 1
Plugin 20, Name: RnRes2Br1Br2c_TRT, Version: 2
Plugin 21, Name: RnRes2Br2bBr2c_TRT, Version: 1
Plugin 22, Name: RnRes2Br2bBr2c_TRT, Version: 2
Plugin 23, Name: RnRes2FullFusion_TRT, Version: 1
Plugin 24, Name: SingleStepLSTMPlugin, Version: 1
Plugin 25, Name: CustomQKVToContextPluginDynamic, Version: 3
Plugin 26, Name: CustomQKVToContextPluginDynamic, Version: 1
Plugin 27, Name: CustomQKVToContextPluginDynamic, Version: 2
Plugin 28, Name: DLRM_BOTTOM_MLP_TRT, Version: 1
Plugin 29, Name: SmallTileGEMM_TRT, Version: 1
Plugin 30, Name: RNNTEncoderPlugin, Version: 1
It is caused by the mismatch of TRT version. Have you rebuild the docker image when you upgrade the TensorRT-LLM to 0.10.0? Since TensorRT-LLM 0.10.0 uses TensorRT 10 while older TensorRT-LLM uses TensorRT 9.
@byshiue yes, I've rebuild the docker image. You can see in my above list, the tensorrt already 0.10.1: libnvinfer_plugin_tensorrt_llm.so.10.0.1
ll /usr/local/tensorrt/lib/
total 3.5G
lrwxrwxrwx 1 root root 20 Apr 15 23:25 libnvinfer.so -> libnvinfer.so.10.0.1
lrwxrwxrwx 1 root root 20 Apr 15 23:25 libnvinfer.so.10 -> libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 224M Apr 15 23:25 libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 1.3G Apr 15 23:26 libnvinfer_builder_resource.so.10.0.1
lrwxrwxrwx 1 root root 29 Apr 15 23:22 libnvinfer_dispatch.so -> libnvinfer_dispatch.so.10.0.1
lrwxrwxrwx 1 root root 29 Apr 15 23:22 libnvinfer_dispatch.so.10 -> libnvinfer_dispatch.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:22 libnvinfer_dispatch.so.10.0.1
-rw-r--r-- 1 root root 751K Apr 15 23:22 libnvinfer_dispatch_static.a
lrwxrwxrwx 1 root root 25 Apr 15 23:22 libnvinfer_lean.so -> libnvinfer_lean.so.10.0.1
lrwxrwxrwx 1 root root 25 Apr 15 23:22 libnvinfer_lean.so.10 -> libnvinfer_lean.so.10.0.1
-rwxr-xr-x 1 root root 33M Apr 15 23:22 libnvinfer_lean.so.10.0.1
-rw-r--r-- 1 root root 243M Apr 15 23:22 libnvinfer_lean_static.a
lrwxrwxrwx 1 root root 27 Apr 15 23:26 libnvinfer_plugin.so -> libnvinfer_plugin.so.10.0.1
lrwxrwxrwx 1 root root 27 Apr 15 23:26 libnvinfer_plugin.so.10 -> libnvinfer_plugin.so.10.0.1
-rwxr-xr-x 1 root root 33M Apr 15 23:26 libnvinfer_plugin.so.10.0.1
-rw-r--r-- 1 root root 37M Apr 15 23:26 libnvinfer_plugin_static.a
-rw-r--r-- 1 root root 1.7G Apr 15 23:26 libnvinfer_static.a
lrwxrwxrwx 1 root root 30 Apr 15 23:26 libnvinfer_vc_plugin.so -> libnvinfer_vc_plugin.so.10.0.1
lrwxrwxrwx 1 root root 30 Apr 15 23:26 libnvinfer_vc_plugin.so.10 -> libnvinfer_vc_plugin.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:26 libnvinfer_vc_plugin.so.10.0.1
-rw-r--r-- 1 root root 442K Apr 15 23:26 libnvinfer_vc_plugin_static.a
lrwxrwxrwx 1 root root 21 Apr 15 23:26 libnvonnxparser.so -> libnvonnxparser.so.10
lrwxrwxrwx 1 root root 25 Apr 15 23:26 libnvonnxparser.so.10 -> libnvonnxparser.so.10.0.1
-rwxr-xr-x 1 root root 3.4M Apr 15 23:22 libnvonnxparser.so.10.0.1
-rw-r--r-- 1 root root 19M Apr 15 23:22 libnvonnxparser_static.a
-rw-r--r-- 1 root root 675K Apr 15 23:26 libonnx_proto.a
drwxr-xr-x 2 root root 168 Apr 15 23:26 stubs
@byshiue yes, I've rebuild the docker image. You can see in my above list, the tensorrt already 0.10.1: libnvinfer_plugin_tensorrt_llm.so.10.0.1
ll /usr/local/tensorrt/lib/ total 3.5G lrwxrwxrwx 1 root root 20 Apr 15 23:25 libnvinfer.so -> libnvinfer.so.10.0.1 lrwxrwxrwx 1 root root 20 Apr 15 23:25 libnvinfer.so.10 -> libnvinfer.so.10.0.1 -rwxr-xr-x 1 root root 224M Apr 15 23:25 libnvinfer.so.10.0.1 -rwxr-xr-x 1 root root 1.3G Apr 15 23:26 libnvinfer_builder_resource.so.10.0.1 lrwxrwxrwx 1 root root 29 Apr 15 23:22 libnvinfer_dispatch.so -> libnvinfer_dispatch.so.10.0.1 lrwxrwxrwx 1 root root 29 Apr 15 23:22 libnvinfer_dispatch.so.10 -> libnvinfer_dispatch.so.10.0.1 -rwxr-xr-x 1 root root 965K Apr 15 23:22 libnvinfer_dispatch.so.10.0.1 -rw-r--r-- 1 root root 751K Apr 15 23:22 libnvinfer_dispatch_static.a lrwxrwxrwx 1 root root 25 Apr 15 23:22 libnvinfer_lean.so -> libnvinfer_lean.so.10.0.1 lrwxrwxrwx 1 root root 25 Apr 15 23:22 libnvinfer_lean.so.10 -> libnvinfer_lean.so.10.0.1 -rwxr-xr-x 1 root root 33M Apr 15 23:22 libnvinfer_lean.so.10.0.1 -rw-r--r-- 1 root root 243M Apr 15 23:22 libnvinfer_lean_static.a lrwxrwxrwx 1 root root 27 Apr 15 23:26 libnvinfer_plugin.so -> libnvinfer_plugin.so.10.0.1 lrwxrwxrwx 1 root root 27 Apr 15 23:26 libnvinfer_plugin.so.10 -> libnvinfer_plugin.so.10.0.1 -rwxr-xr-x 1 root root 33M Apr 15 23:26 libnvinfer_plugin.so.10.0.1 -rw-r--r-- 1 root root 37M Apr 15 23:26 libnvinfer_plugin_static.a -rw-r--r-- 1 root root 1.7G Apr 15 23:26 libnvinfer_static.a lrwxrwxrwx 1 root root 30 Apr 15 23:26 libnvinfer_vc_plugin.so -> libnvinfer_vc_plugin.so.10.0.1 lrwxrwxrwx 1 root root 30 Apr 15 23:26 libnvinfer_vc_plugin.so.10 -> libnvinfer_vc_plugin.so.10.0.1 -rwxr-xr-x 1 root root 965K Apr 15 23:26 libnvinfer_vc_plugin.so.10.0.1 -rw-r--r-- 1 root root 442K Apr 15 23:26 libnvinfer_vc_plugin_static.a lrwxrwxrwx 1 root root 21 Apr 15 23:26 libnvonnxparser.so -> libnvonnxparser.so.10 lrwxrwxrwx 1 root root 25 Apr 15 23:26 libnvonnxparser.so.10 -> libnvonnxparser.so.10.0.1 -rwxr-xr-x 1 root root 3.4M Apr 15 23:22 libnvonnxparser.so.10.0.1 -rw-r--r-- 1 root root 19M Apr 15 23:22 libnvonnxparser_static.a -rw-r--r-- 1 root root 675K Apr 15 23:26 libonnx_proto.a drwxr-xr-x 2 root root 168 Apr 15 23:26 stubs
How do you build the docker image and the tensorrt_llm?
with following docker file
# Use an official NVIDIA CUDA image as a parent image
FROM nvidia/cuda:12.4.1-devel-ubuntu20.04
# Set the working directory
WORKDIR /app
# Install software-properties-common to add repositories
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y software-properties-common
# Add deadsnakes PPA for newer Python versions
RUN add-apt-repository ppa:deadsnakes/ppa
# Install necessary packages
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
python3.10 \
python3.10-distutils \
python3-pip \
openmpi-bin \
libopenmpi-dev \
git \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update \
&& apt-get install python3.10-venv \
&& python3.10 -m venv venv_dev
RUN apt-get update \
&& apt-get install -y python3.10-dev
RUN . venv_dev/bin/activate \
&& python3 -m pip install -U pip \
&& pip3 install tensorrt_llm --pre --extra-index-url https://pypi.nvidia.com --timeout 3600
RUN apt-get install wget \
&& wget https://github.com/Kitware/CMake/releases/download/v3.29.2/cmake-3.29.2-linux-x86_64.sh\
&& chmod +x cmake-3.29.2-linux-x86_64.sh\
&& ./cmake-3.29.2-linux-x86_64.sh --skip-license --prefix=/usr/local
RUN git clone https://github.com/NVIDIA/TensorRT-LLM.git tensorrt-llm \
&& cd tensorrt-llm \
&& ENV=/root/.bashrc bash docker/common/install_tensorrt.sh
RUN apt-get install -y vim git-lfs
RUN export PYTHONPATH=/app/tensorrt-llm/3rdparty/cutlass/python:$PYTHONPATH \
&& . /app/venv_dev/bin/activate \
&& cd tensorrt-llm \
&& git lfs install \
&& git lfs pull \
&& python scripts/build_wheel.py -c -D"TRT_INCLUDE_DIR=/usr/local/tensorrt/include" -D"TRT_LIB_DIR=/usr/local/tensorrt/lib"
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["bash", "echo Hello World!"]
It seems you don't use the official docker file. Could you take a try?
@byshiue
I follow the steps in https://nvidia.github.io/TensorRT-LLM/installation/linux.html to create a new docker env.
Get similar error.
Process 0 loading engine from /root/models/tmp/trt_engines/Meta-Llama-3-8B-Instruct/fp16/1-gpu-tp1/rank0.engine
[05/24/2024-08:20:11] [TRT] [I] Loaded engine size: 15323 MiB
[05/24/2024-08:20:13] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/24/2024-08:20:13] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/24/2024-08:20:13] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Following is the modules related to tensorrt
root@e8fbc031fb35:~/TensorRT-LLM/examples/llama# pip list | grep tensorrt
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.11.0.dev2024052100
PS, I didn't find tensorrt under /usr/local/tensorrt/lib, does it located in somewhere or need additional steps?
Could you take a try following the guide here https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation/build-from-source-linux.md#option-1-build-tensorrt-llm-in-one-step?
I build a new docker image with:
make release_build CUDA_ARCHS="80-real"
the image could build and I can use this image to convert & build with following command:
python ../llama/convert_checkpoint.py --model_dir /mnt/memory/Meta-Llama-3-8B-Instruct --output_dir /mnt/memory/tmp/trt_models/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp --dtype float16 --use_weight_only --weight_only_precision int4 --load_model_on_cpu
trtllm-build \
--checkpoint_dir /mnt/memory/tmp/trt_models/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp \
--output_dir /mnt/memory/tmp/trt_engines/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp \
--gemm_plugin float16 \
--gpt_attention_plugin float16 \
--max_batch_size 1 \
--max_input_len 2048 \
--max_output_len 1024
test load
import tensorrt as trt
# Initialize TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
# Function to load TensorRT engine
def load_engine(engine_path):
with open(engine_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())
rank = 0
# Determine the engine file based on the rank
engine_path = f'/mnt/memory/tmp/trt_engines/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp/rank0.engine'
load_engine(engine_path)
get error:
[05/31/2024-00:22:08] [TRT] [I] Loaded engine size: 5342 MiB
[05/31/2024-00:22:09] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[05/31/2024-00:22:09] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[05/31/2024-00:22:09] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Could you share the trt version log of python and c side by
$ pip list | grep tensorrt
tensorrt 10.0.1
tensorrt-llm 0.11.0.dev2024052800
torch-tensorrt 2.3.0a0
$ cat /usr/local/tensorrt/include/NvInferVersion.h | grep version
//! Defines the TensorRT version
#define NV_TENSORRT_MAJOR 10 //!< TensorRT major version.
#define NV_TENSORRT_MINOR 0 //!< TensorRT minor version.
#define NV_TENSORRT_PATCH 1 //!< TensorRT patch version.
#define NV_TENSORRT_LWS_MAJOR 0 //!< TensorRT LWS major version.
#define NV_TENSORRT_LWS_MINOR 0 //!< TensorRT LWS minor version.
#define NV_TENSORRT_LWS_PATCH 0 //!< TensorRT LWS patch version.
$ cat /usr/local/tensorrt/include/NvInferVersion.h | grep version
//! Defines the TensorRT version
#define NV_TENSORRT_MAJOR 10 //!< TensorRT major version.
#define NV_TENSORRT_MINOR 0 //!< TensorRT minor version.
#define NV_TENSORRT_PATCH 1 //!< TensorRT patch version.
#define NV_TENSORRT_LWS_MAJOR 0 //!< TensorRT LWS major version.
#define NV_TENSORRT_LWS_MINOR 0 //!< TensorRT LWS minor version.
#define NV_TENSORRT_LWS_PATCH 0 //!< TensorRT LWS patch version.
$ pip list | grep tensorrt
tensorrt 10.0.1
tensorrt-llm 0.11.0.dev2024052800
torch-tensorrt 2.3.0a0
Could you add your trt_llm root folder into PYTHONPATH environment variable and try again?
do you mean /app/tensorrt_llm/, looks there's no python related content in the folder
ll /app/tensorrt_llm/
total 12
drwxr-xr-x 1 root root 40 May 29 13:01 ./
drwxr-xr-x 1 root root 26 May 29 12:19 ../
-rw-rw-r-- 1 root root 5412 May 29 02:20 README.md
drwxr-xr-x 1 root root 17 Apr 12 08:53 benchmarks/
drwxr-xr-x 3 root root 108 Apr 9 06:19 docs/
drwxrwxrwx 1 root root 4096 May 29 12:05 examples/
drwxr-xr-x 3 root root 26 Apr 9 06:19 include/
lrwxrwxrwx 1 root root 57 May 29 13:01 lib -> /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/
I checkout the source code into /app/tensorrt-llm-src, and try to load, get same error
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:07:58] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:02] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt_llm
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:08:35] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:35] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:08:59] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:00] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src/tensorrt_llm
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:09:17] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:17] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Did you resolve this?
I mean setting PYTHONPATH=tensorrt_llm_backend/tensorrt_llm after building the tensorrt_llm in the docker image.
@byshiue what do you mean for tensorrt_llm_backend?
tensorrt_llm_backend means the root path of the repo you clone from https://github.com/triton-inference-server/tensorrtllm_backend
do you mean
/app/tensorrt_llm/, looks there's no python related content in the folderll /app/tensorrt_llm/ total 12 drwxr-xr-x 1 root root 40 May 29 13:01 ./ drwxr-xr-x 1 root root 26 May 29 12:19 ../ -rw-rw-r-- 1 root root 5412 May 29 02:20 README.md drwxr-xr-x 1 root root 17 Apr 12 08:53 benchmarks/ drwxr-xr-x 3 root root 108 Apr 9 06:19 docs/ drwxrwxrwx 1 root root 4096 May 29 12:05 examples/ drwxr-xr-x 3 root root 26 Apr 9 06:19 include/ lrwxrwxrwx 1 root root 57 May 29 13:01 lib -> /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/I checkout the source code into /app/tensorrt-llm-src, and try to load, get same error
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py [06/07/2024-11:07:58] [TRT] [I] Loaded engine size: 5342 MiB [06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:08:02] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt_llm root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py [06/07/2024-11:08:35] [TRT] [I] Loaded engine size: 5342 MiB [06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:08:35] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py [06/07/2024-11:08:59] [TRT] [I] Loaded engine size: 5342 MiB [06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:09:00] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src/tensorrt_llm root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py [06/07/2024-11:09:17] [TRT] [I] Loaded engine size: 5342 MiB [06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:09:17] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
@byshiue As my above reply, the /app/tensorrt_llm/ do not contains full repo content, and I checked out code into /app/tensorrt-llm-src, I tried both path, all get the same error.
Could you try using the this docker image nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3 directly?
Hi @gloritygithub11 do u still have further issue or question now? If not, we'll close it soon.