getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm
System Info
- A100 40G
- tensorrt 10.0.1
- tensorrt-llm 0.10.0.dev2024050700
Who can help?
@Tracin
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I can build the Mixtral 8x7B engine with:
export MODEL_NAME="mixtral-8x7b-instruct-v0.1"
export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/tensorrt/bin:$PATH
export PRECISION=W4A16
export DTYPE=float16
export PYTHONPATH=<path to tensor rt>/TensorRT-LLM:$PYTHONPATH
python ../llama/convert_checkpoint.py \
--model_dir /home/ateam/xu.xiaodong/models/mixtral-8x7b-instruct-v0.1 \
--output_dir tmp/trt_models/${MODEL_NAME}/$PRECISION/1-gpu \
--use_weight_only \
--weight_only_precision int4 \
--dtype $DTYPE \
--workers 8 \
--load_model_on_cpu
trtllm-build \
--checkpoint_dir tmp/trt_models/${MODEL_NAME}/$PRECISION/1-gpu \
--output_dir trt_engines/${MODEL_NAME}/$PRECISION/1-gpu \
--gemm_plugin $DTYPE \
--gpt_attention_plugin $DTYPE \
--max_batch_size 1 \
--max_input_len 2048 \
--max_output_len 1024 \
--max_multimodal_len 576
It will fail when load the engine:
@staticmethod
def load_engine(engine_path):
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = runtime.deserialize_cuda_engine(engine_data)
if engine is None:
print("Failed to deserialize the engine.")
return engine
I get error:
[05/09/2024-06:24:33] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [05/09/2024-06:24:33] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [05/09/2024-06:24:33] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Sys info:
- A100 40G
- tensorrt 10.0.1
- tensorrt-llm 0.10.0.dev2024050700
Expected behavior
engine could be loaded without error
actual behavior
engine load fail due to WeightOnlyQuantMatmultensorrt_llm not found
additional notes
The build and load are in the same machine. I can not find a file WeightOnlyQuantMatmultensorrt_llm in the machine
Have you rebuild the docker image when you use 0.10.0-dev branch? This branch upgrade the TensorRT from 9 to 10. So, you need to update the docker image, too.
@byshiue I didn't use docker for dev. docker/common/install_tensorrt.sh is executed in my local env, and tensorrt has already been upgraded to 10.0.1
I tried rebuild with docker image, get the same error
It is hard to provide helps if we cannot reproduce the issue on our side.
Because it should be an environment issue, I suggest removing the whole project/repo, cloning new one, installing trt-10 first, and then build the tensorrt_llm again.
@gloritygithub11 do u still have further issue or question now? If not, we'll close it soon.