TensorRT-LLM AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '

System Info

CPU: X86
GPU: NVIDIA L20
python
- tensorrt 10.3.0
- tensorrt-cu12 10.3.0
- tensorrt-cu12-bindings 10.3.0
- tensorrt-cu12-libs 10.3.0
- tensorrt-llm 0.13.0.dev2024081300

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

step 1:

pip install -r requirements.txt
apt-get update
apt-get install git-lfs
rm -rf chatglm*

git clone https://huggingface.co/THUDM/chatglm3-6b-base chatglm3_6b_base

step 2(that step get success):

python3 convert_checkpoint.py --model_dir chatglm3_6b_base --output_dir trt_ckpt/chatglm3_6b/fp16/1-gpu

step 3(get error at this step):

trtllm-build --checkpoint_dir trt_ckpt/chatglm3_6b_base/fp16/1-gpu \
        --gemm_plugin float16 \
        --output_dir trt_engines/chatglm3_6b_base/fp16/1-gpu

Expected behavior

when I operate as example, however get error at the step of:

trtllm-build --checkpoint_dir trt_ckpt/ --gemm_plugin float16 --output_dir trt_engines

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.13.0.dev2024081300
[08/15/2024-06:04:26] [TRT-LLM] [I] Compute capability: (8, 9)
[08/15/2024-06:04:26] [TRT-LLM] [I] SM count: 92
[08/15/2024-06:04:26] [TRT-LLM] [I] SM clock: 2520 MHz
[08/15/2024-06:04:26] [TRT-LLM] [I] int4 TFLOPS: 474
[08/15/2024-06:04:26] [TRT-LLM] [I] int8 TFLOPS: 237
[08/15/2024-06:04:26] [TRT-LLM] [I] fp8 TFLOPS: 237
[08/15/2024-06:04:26] [TRT-LLM] [I] float16 TFLOPS: 118
[08/15/2024-06:04:26] [TRT-LLM] [I] bfloat16 TFLOPS: 118
[08/15/2024-06:04:26] [TRT-LLM] [I] float32 TFLOPS: 59
[08/15/2024-06:04:26] [TRT-LLM] [I] Total Memory: 44 GiB
[08/15/2024-06:04:26] [TRT-LLM] [I] Memory clock: 9001 MHz
[08/15/2024-06:04:26] [TRT-LLM] [I] Memory bus width: 384
[08/15/2024-06:04:26] [TRT-LLM] [I] Memory bandwidth: 864 GB/s
[08/15/2024-06:04:26] [TRT-LLM] [I] PCIe speed: 16000 Mbps
[08/15/2024-06:04:26] [TRT-LLM] [I] PCIe link width: 16
[08/15/2024-06:04:26] [TRT-LLM] [I] PCIe bandwidth: 32 GB/s
Traceback (most recent call last):
  File "/root/anaconda3/envs/trt_llm/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 528, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 394, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 361, in build_and_save
    engine = build_model(build_config,
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 303, in build_model
    assert not build_config.plugin_config.streamingllm or architecture == "LlamaForCausalLM", \
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/plugin/plugin.py", line 79, in prop
    field_value = getattr(self, storage_name)
AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'?

additional notes

[TensorRT-LLM] TensorRT-LLM version: 0.13.0.dev2024081300
[08/15/2024-06:04:26] [TRT-LLM] [I] Compute capability: (8, 9)
[08/15/2024-06:04:26] [TRT-LLM] [I] SM count: 92
[08/15/2024-06:04:26] [TRT-LLM] [I] SM clock: 2520 MHz
[08/15/2024-06:04:26] [TRT-LLM] [I] int4 TFLOPS: 474
[08/15/2024-06:04:26] [TRT-LLM] [I] int8 TFLOPS: 237
[08/15/2024-06:04:26] [TRT-LLM] [I] fp8 TFLOPS: 237
[08/15/2024-06:04:26] [TRT-LLM] [I] float16 TFLOPS: 118
[08/15/2024-06:04:26] [TRT-LLM] [I] bfloat16 TFLOPS: 118
[08/15/2024-06:04:26] [TRT-LLM] [I] float32 TFLOPS: 59
[08/15/2024-06:04:26] [TRT-LLM] [I] Total Memory: 44 GiB
[08/15/2024-06:04:26] [TRT-LLM] [I] Memory clock: 9001 MHz
[08/15/2024-06:04:26] [TRT-LLM] [I] Memory bus width: 384
[08/15/2024-06:04:26] [TRT-LLM] [I] Memory bandwidth: 864 GB/s
[08/15/2024-06:04:26] [TRT-LLM] [I] PCIe speed: 16000 Mbps
[08/15/2024-06:04:26] [TRT-LLM] [I] PCIe link width: 16
[08/15/2024-06:04:26] [TRT-LLM] [I] PCIe bandwidth: 32 GB/s
Traceback (most recent call last):
  File "/root/anaconda3/envs/trt_llm/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 528, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 394, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 361, in build_and_save
    engine = build_model(build_config,
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 303, in build_model
    assert not build_config.plugin_config.streamingllm or architecture == "LlamaForCausalLM", \
  File "/root/anaconda3/envs/trt_llm/lib/python3.10/site-packages/tensorrt_llm/plugin/plugin.py", line 79, in prop
    field_value = getattr(self, storage_name)
AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'?

Aug 15 '24 06:08 BooHwang

How about referring this one ?: https://github.com/NVIDIA/TensorRT-LLM/issues/1968#issuecomment-2252750163

Aug 15 '24 09:08 Kefeng-Duan

I fixed this issue by editing plugin/plugin.py and changing all the dataclass fields in PluginConfig to have init=True

Aug 15 '24 22:08 aninrusimha

How about referring this one ?: #1968 (comment)

I had do that before report this issue, not work for me.

Aug 16 '24 03:08 BooHwang

I fixed this issue by editing plugin/plugin.py and changing all the dataclass fields in PluginConfig to have init=True

It work for me, but the engine need verify correctness, thx for your help.

Aug 16 '24 04:08 BooHwang

@BooHwang https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md#run-llama-with-streamingllm sorry, could you try --streamingllm enable when building the engine?

Aug 19 '24 08:08 Kefeng-Duan

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Oct 05 '24 02:10 github-actions[bot]

I fixed this issue by editing plugin/plugin.py and changing all the dataclass fields in PluginConfig to have init=True

It works,tks.

Dec 13 '24 12:12 zhangyu68

This is quite an old issue, but I wanted to share, just for your information, that I’ve confirmed these two commands now work without triggering the previously reported problem.

# convert
python ${TRTLLM_DIR}/examples/models/core/glm-4-9b/convert_checkpoint.py \
        --model_dir chatglm3_6b_base \
        --output_dir chatglm3_6b_fp16_1_gpu

# build
trtllm-build --checkpoint_dir chatglm3_6b_fp16_1_gpu \
        --gemm_plugin float16 \
        --output_dir chatglm3_6b_fp16_1_gpu_built

I'll close this ticket, but feel free to open another issue if needed~ :)

Sep 22 '25 18:09 karljang