llama_index [Bug]: KeyError: 'plugin_config' with TensorRT-LLM

[Bug]: KeyError: 'plugin_config' with TensorRT-LLM

Open teis-e opened this issue 10 months ago • 2 comments

Bug Description

Running TensorLLM object:

from llama_index.llms.nvidia_tensorrt import LocalTensorRTLLM

llm = LocalTensorRTLLM(
    model_path="./model",
    engine_name="rank0.engine",
    tokenizer_dir="/root/.cache/huggingface/hub/models--Qwen--Qwen1.5-72B-Chat-GPTQ-Int4/snapshots/>
    completion_to_prompt=completion_to_prompt,
    verbose=True,
    max_new_tokens=640,
    temperature=0
)

Results in:

[TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last):
  File "/opt/tritonserver/ai.py", line 76, in <module>
    llm = LocalTensorRTLLM(
  File "/usr/local/lib/python3.10/dist-packages/llama_index/llms/nvidia_tensorrt/base.py", line 109, in __init__
    use_gpt_attention_plugin = config["plugin_config"][
KeyError: 'plugin_config'

Version

0.10.23

Steps to Reproduce

Run the bug description

Relevant Logs/Tracbacks

No response

Mar 27 '24 10:03 teis-e

Hey there, @teis-e! Great to see you back 😊 Hope you're doing well!

The KeyError for 'plugin_config' you're encountering when running the LocalTensorRTLLM object from the LlamaIndex v0.10.23 is likely due to the absence of the 'plugin_config' key in the config.json file associated with the model you are trying to load. This file is expected to contain configuration details necessary for initializing the model, including plugin configurations.

In the provided code, the error occurs when attempting to access the plugin_config key and its nested keys (gpt_attention_plugin, remove_input_padding, paged_kv_cache) from the config dictionary loaded from the config.json file:

use_gpt_attention_plugin = config["plugin_config"]["gpt_attention_plugin"]
remove_input_padding = config["plugin_config"]["remove_input_padding"]
paged_kv_cache = config["plugin_config"]["paged_kv_cache"]

To resolve this issue, ensure that the config.json file located in the model directory (engine_dir_path / "config.json") contains the plugin_config key with all the required nested keys (gpt_attention_plugin, remove_input_padding, paged_kv_cache). The structure should look something like this:

{
  "plugin_config": {
    "gpt_attention_plugin": true or false,
    "remove_input_padding": true or false,
    "paged_kv_cache": true or false
  },
  "builder_config": {
    ...
  }
}

Make sure the values for these keys are correctly set according to your model's requirements. You can refer to the relevant section of the code for more details.

As for your other questions, I wasn't able to find specific answers within the repository. It's possible that there were changes made in the LlamaIndex v0.10.24 that could potentially fix the 'plugin_config' KeyError issue, or that there are known issues or bugs related to the 'plugin_config' KeyError in the LlamaIndex v0.10.23. However, without more information, I can't provide a definitive answer. If you have more details or if there's something specific you'd like me to look into, please let me know.

Sources

llama-index-legacy/llama_index/legacy/llms/nvidia_tensorrt.py

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Mar 27 '24 10:03 dosubot[bot]

llama_index llama_index copied to clipboard

[Bug]: KeyError: 'plugin_config' with TensorRT-LLM

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

Sources

About Dosu

llama_index
llama_index copied to clipboard