TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm

Open gloritygithub11 opened this issue 1 year ago • 4 comments

System Info

  • A100 40G
  • tensorrt 10.0.1
  • tensorrt-llm 0.10.0.dev2024050700

Who can help?

@Tracin

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

I can build the Mixtral 8x7B engine with:


export MODEL_NAME="mixtral-8x7b-instruct-v0.1"
export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/tensorrt/bin:$PATH
export PRECISION=W4A16
export DTYPE=float16
export PYTHONPATH=<path to tensor rt>/TensorRT-LLM:$PYTHONPATH


python ../llama/convert_checkpoint.py \
    --model_dir /home/ateam/xu.xiaodong/models/mixtral-8x7b-instruct-v0.1 \
    --output_dir tmp/trt_models/${MODEL_NAME}/$PRECISION/1-gpu \
    --use_weight_only \
    --weight_only_precision int4 \
    --dtype $DTYPE \
    --workers 8 \
    --load_model_on_cpu
    

trtllm-build \
    --checkpoint_dir tmp/trt_models/${MODEL_NAME}/$PRECISION/1-gpu \
    --output_dir trt_engines/${MODEL_NAME}/$PRECISION/1-gpu \
    --gemm_plugin $DTYPE \
    --gpt_attention_plugin $DTYPE \
    --max_batch_size 1 \
    --max_input_len 2048 \
    --max_output_len 1024 \
    --max_multimodal_len 576

It will fail when load the engine:

    @staticmethod
    def load_engine(engine_path):
        TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
        runtime = trt.Runtime(TRT_LOGGER)

        with open(engine_path, 'rb') as f:
            engine_data = f.read()

        engine = runtime.deserialize_cuda_engine(engine_data)
        if engine is None:
            print("Failed to deserialize the engine.")
        return engine

I get error: [05/09/2024-06:24:33] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [05/09/2024-06:24:33] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [05/09/2024-06:24:33] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)

Sys info:

  • A100 40G
  • tensorrt 10.0.1
  • tensorrt-llm 0.10.0.dev2024050700

Expected behavior

engine could be loaded without error

actual behavior

engine load fail due to WeightOnlyQuantMatmultensorrt_llm not found

additional notes

The build and load are in the same machine. I can not find a file WeightOnlyQuantMatmultensorrt_llm in the machine

gloritygithub11 avatar May 09 '24 07:05 gloritygithub11

Have you rebuild the docker image when you use 0.10.0-dev branch? This branch upgrade the TensorRT from 9 to 10. So, you need to update the docker image, too.

byshiue avatar May 10 '24 08:05 byshiue

@byshiue I didn't use docker for dev. docker/common/install_tensorrt.sh is executed in my local env, and tensorrt has already been upgraded to 10.0.1

gloritygithub11 avatar May 10 '24 08:05 gloritygithub11

I tried rebuild with docker image, get the same error

gloritygithub11 avatar May 12 '24 03:05 gloritygithub11

It is hard to provide helps if we cannot reproduce the issue on our side.

Because it should be an environment issue, I suggest removing the whole project/repo, cloning new one, installing trt-10 first, and then build the tensorrt_llm again.

byshiue avatar May 17 '24 08:05 byshiue

@gloritygithub11 do u still have further issue or question now? If not, we'll close it soon.

nv-guomingz avatar Nov 14 '24 03:11 nv-guomingz