when a model has layers with and without GPT plugin enabled, GptSession raises error
System Info
TensorRT-LLM: latest main branch built in the triton-trtllm container (23.12) GPU: V100
Who can help?
@byshiue
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Copy the offical Attention class and modify slightly to disable the GptPlugin as follows:
Class CustomAttention(Attention):
def forward(...):
...
if False and default_net().plugin_config.gpt_attention_plugin:
...
class GPTNeoXDecoderLayer(Module):
def __init__(...):
...
if layer_idx == config.num_hidden_layers - 1:
attn_cls = CustomAttention
else:
attn_cls = Attention
self.attention = attn_cls(
...
- Build the engine:
trtllm-build --checkpoint_dir ./trt_ckpt/fp16/4p/ \
--use_gemm_plugin float16 \
--use_gpt_attention_plugin float16 \
--max_batch_size 1 \
--max_input_len 2048 \
--max_output_len 1024 \
--workers 4 \
--output_dir ./trt_engines/fp16/4p_customattn/
- Run the engine with GpySession and PySession:
mpirun -n 4 --allow-run-as-root \
python ../run.py --max_output_len=20 \
--engine_dir=./trt_engines/fp16/4p_customattn/ \
--tokenizer_dir=./gpt-neox-20b
mpirun -n 4 --allow-run-as-root \
python ../run.py --max_output_len=20 \
--engine_dir=./trt_engines/fp16/4p_customattn/ \
--tokenizer_dir=./gpt-neox-20b \
--use_py_session
Expected behavior
The engine can be used with both GptSession and PySession.
actual behavior
The engine has been built successfully, and can be run with PySession. Running the engine with PySession is normal with the following output:
Input [Text 0]: "Born in north-east France, Soyer trained as a"
Output [Text 0 Beam 0]: " chef in the kitchens of the Chateaux of the Loire, and then in the kitchen"
However, it raises RuntimeError when GptSession is used:
[TensorRT-LLM][ERROR] 7: [shapeMachine.cpp::executeContinuation::887] Error Code 7: Internal Error (Dimensions with name past_key_len must be equal. Condition '==' violated: 32 != 44. Instruction: CHECK_EQUAL 32 44.)
[TensorRT-LLM][ERROR] 7: [shapeMachine.cpp::executeContinuation::887] Error Code 7: Internal Error (Dimensions with name past_key_len must be equal. Condition '==' violated: 32 != 44. Instruction: CHECK_EQUAL 32 44.)
[TensorRT-LLM][ERROR] 7: [shapeMachine.cpp::executeContinuation::887] Error Code 7: Internal Error (Dimensions with name past_key_len must be equal. Condition '==' violated: 32 != 44. Instruction: CHECK_EQUAL 32 44.)
[TensorRT-LLM][ERROR] 7: [shapeMachine.cpp::executeContinuation::887] Error Code 7: Internal Error (Dimensions with name past_key_len must be equal. Condition '==' violated: 32 != 44. Instruction: CHECK_EQUAL 32 44.)
Traceback (most recent call last):
File "/home/ma-user/work/TensorRT-LLM/examples/gptneox/../run.py", line 496, in <module>
main(args)
File "/home/ma-user/work/TensorRT-LLM/examples/gptneox/../run.py", line 374, in main
Traceback (most recent call last):
File "/home/ma-user/work/TensorRT-LLM/examples/gptneox/../run.py", line 496, in <module>
outputs = runner.generate(
File "/home/ma-user/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 338, in generate
self.session.generate(generation_output, generation_input,
RuntimeError: Invalid input shape (/home/ma-user/work/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:180)
1 0x7f6b8a7ec6a3 /home/ma-user/.local/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x9c6a3) [0x7f6b8a7ec6a3]
2 0x7f6b8a88f1f0 tensorrt_llm::runtime::GptSession::executeGenerationStep(int, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager*, std::vector<bool, std::allocator<bool> >&) + 992
3 0x7f6b8a891148 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3368
4 0x7f6b8a8929f8 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 3080
5 0x7f6b8a824459 /home/ma-user/.local/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xd4459) [0x7f6b8a824459]
6 0x7f6b8a80c47e /home/ma-user/.local/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xbc47e) [0x7f6b8a80c47e]
7 0x5609c0fe310e python(+0x15a10e) [0x5609c0fe310e]
8 0x5609c0fd9a7b _PyObject_MakeTpCall + 603
9 0x5609c0ff1acb python(+0x168acb) [0x5609c0ff1acb]
10 0x5609c0fd1cfa _PyEval_EvalFrameDefault + 24906
11 0x5609c0ff17f1 python(+0x1687f1) [0x5609c0ff17f1]
12 0x5609c0ff2492 PyObject_Call + 290
13 0x5609c0fce5d7 _PyEval_EvalFrameDefault + 10791
14 0x5609c0fe39fc _PyFunction_Vectorcall + 124
15 0x5609c0fcc26d _PyEval_EvalFrameDefault + 1725
16 0x5609c0fc89c6 python(+0x13f9c6) [0x5609c0fc89c6]
17 0x5609c10be256 PyEval_EvalCode + 134
18 0x5609c10e9108 python(+0x260108) [0x5609c10e9108]
19 0x5609c10e29cb python(+0x2599cb) [0x5609c10e29cb]
20 0x5609c10e8e55 python(+0x25fe55) [0x5609c10e8e55]
21 0x5609c10e8338 _PyRun_SimpleFileObject + 424
22 0x5609c10e7f83 _PyRun_AnyFileObject + 67
23 0x5609c10daa5e Py_RunMain + 702
24 0x5609c10b102d Py_BytesMain + 45
25 0x7f6cedafdd90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6cedafdd90]
26 0x7f6cedafde40 __libc_start_main + 128
27 0x5609c10b0f25 _start + 37
@QiJune any updates on this?
@llan-ml , Apologies for the very delayed response. Is this ticket still relevant? If so, could you try the latest version to see if the issue persists?
Issue has not received an update in over 14 days. Adding stale label.
Closing issue as stale, please feel free to open new one if the problem persists.