tensorrtllm_backend
tensorrtllm_backend copied to clipboard
get_parameter(model_config, "max_attention_window_size", int) not support list
System Info
a100
Who can help?
@ncomly-nvidia
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
In all_models/inflight_batcher_llm/tensorrt_llm/1/model.py line 422 we have:
get_parameter(model_config, "max_attention_window_size", int),
But i want to set max_attention_window_size as a list. Every layer have a max_attention_window_size.
Also i want to set this list in:
all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt:
parameters: {
key: "max_attention_window_size"
value: {
string_value: "${max_attention_window_size}"
}
}
I use this feature for gemma2:
max_attention_window_size = [8192, 4096]*21
Expected behavior
support list.
actual behavior
not support list. Only support int
additional notes
.