llama_index
llama_index copied to clipboard
[Bug]: KeyError: 'plugin_config' with TensorRT-LLM
Bug Description
Running TensorLLM object:
from llama_index.llms.nvidia_tensorrt import LocalTensorRTLLM
llm = LocalTensorRTLLM(
model_path="./model",
engine_name="rank0.engine",
tokenizer_dir="/root/.cache/huggingface/hub/models--Qwen--Qwen1.5-72B-Chat-GPTQ-Int4/snapshots/>
completion_to_prompt=completion_to_prompt,
verbose=True,
max_new_tokens=640,
temperature=0
)
Results in:
[TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last):
File "/opt/tritonserver/ai.py", line 76, in <module>
llm = LocalTensorRTLLM(
File "/usr/local/lib/python3.10/dist-packages/llama_index/llms/nvidia_tensorrt/base.py", line 109, in __init__
use_gpt_attention_plugin = config["plugin_config"][
KeyError: 'plugin_config'
Version
0.10.23
Steps to Reproduce
Run the bug description
Relevant Logs/Tracbacks
No response
Hey there, @teis-e! Great to see you back 😊 Hope you're doing well!
The KeyError
for 'plugin_config'
you're encountering when running the LocalTensorRTLLM
object from the LlamaIndex v0.10.23 is likely due to the absence of the 'plugin_config'
key in the config.json
file associated with the model you are trying to load. This file is expected to contain configuration details necessary for initializing the model, including plugin configurations.
In the provided code, the error occurs when attempting to access the plugin_config
key and its nested keys (gpt_attention_plugin
, remove_input_padding
, paged_kv_cache
) from the config
dictionary loaded from the config.json
file:
use_gpt_attention_plugin = config["plugin_config"]["gpt_attention_plugin"]
remove_input_padding = config["plugin_config"]["remove_input_padding"]
paged_kv_cache = config["plugin_config"]["paged_kv_cache"]
To resolve this issue, ensure that the config.json
file located in the model directory (engine_dir_path / "config.json"
) contains the plugin_config
key with all the required nested keys (gpt_attention_plugin
, remove_input_padding
, paged_kv_cache
). The structure should look something like this:
{
"plugin_config": {
"gpt_attention_plugin": true or false,
"remove_input_padding": true or false,
"paged_kv_cache": true or false
},
"builder_config": {
...
}
}
Make sure the values for these keys are correctly set according to your model's requirements. You can refer to the relevant section of the code for more details.
As for your other questions, I wasn't able to find specific answers within the repository. It's possible that there were changes made in the LlamaIndex v0.10.24 that could potentially fix the 'plugin_config' KeyError issue, or that there are known issues or bugs related to the 'plugin_config' KeyError in the LlamaIndex v0.10.23. However, without more information, I can't provide a definitive answer. If you have more details or if there's something specific you'd like me to look into, please let me know.
Sources
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.