llama_index
llama_index copied to clipboard
[Bug]: LocalTensorRTLLM not loading without any debug.
Bug Description
I got all the necessary files provided in this Nvidia TensorRT-LLM documantation
Like this in my dir:
app.py model/ ├── .gitkeep ├── config.json ├── generation_config.json ├── LICENSE.txt ├── llama_float16_tp1_rank0.engine ├── model.cache ├── model.safetensors.index.json ├── model-00001-of-00003.safetensors ├── model-00002-of-00003.safetensors ├── model-00003-of-00003.safetensors ├── pytorch_model.bin.index.json ├── pytorch_model-00001-of-00003.bin ├── pytorch_model-00002-of-00003.bin ├── pytorch_model-00003-of-00003.bin ├── README.md ├── Responsible-Use-Guide.pdf ├── special_tokens_map.json ├── tokenizer.json ├── tokenizer.model ├── tokenizer_config.json └── USE_POLICY.md
All in a venv with a working llama index.
Cuda, cuDNN etc is installed.
Operating system is WSL 2.0 Ubuntu 22.04 in Windows 10
`import logging import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.llms import LocalTensorRTLLM
def completion_to_prompt(completion: str) -> str:
"""
Given a completion, return the prompt using llama2 format.
"""
return f" [INST] {completion} [/INST] "
llm = LocalTensorRTLLM( model_path="./model", engine_name="llama_float16_tp1_rank0.engine", tokenizer_dir="meta-llama/Llama-2-13b-chat", completion_to_prompt=completion_to_prompt, verbose=True )
resp = llm.complete("Who is Paul Graham?") print(str(resp)) `
Version
0.9.43
Steps to Reproduce
Running the code with all necessary libraries installed.
Relevant Logs/Tracbacks
No response
🤖
Hey @teis-e! Great to see you back. Hope you're doing well!
Based on the information you've provided, it's difficult to pinpoint the exact issue as there are no error messages or logs. However, I can suggest a few things to check:
-
Model File Existence: Ensure that the model file exists at the specified path. The
LocalTensorRTLLM
class checks if the model file exists at the provided path and raises a ValueError if it does not. -
TensorRT-LLM Library: Make sure that the TensorRT-LLM library is installed and can be imported correctly. The code tries to import the
tensorrt_llm
library and raises an ImportError if it fails. -
CUDA Availability: The code requires a CUDA-enabled GPU to run. It checks if CUDA is available and raises an AssertionError if it is not. Please ensure that a CUDA-enabled GPU is available and the necessary drivers are installed.
-
World Size Matching: The code checks if the world size of the engine matches the runtime world size and raises an AssertionError if they do not match. Please ensure that the world sizes match.
-
Pipeline Parallelism: The code does not support pipeline parallelism. It raises an AssertionError if the pipeline parallelism size is not 1. Please ensure that the pipeline parallelism size is 1.
-
Engine File Opening: The code tries to open the engine file and raises an error if it fails. Please ensure that the engine file can be opened correctly.
Regarding the model_path
and engine_name
parameters in the LocalTensorRTLLM
class, model_path
should be a directory containing the TensorRT engine and its configuration, and engine_name
(if provided) should be the name of the serialized TensorRT engine file within that directory.
If you're still encountering issues, could you please provide more specific error messages or logs? This would help in diagnosing the problem more accurately.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.