llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

[Bug]: LocalTensorRTLLM not loading without any debug.

Open teis-e opened this issue 1 year ago • 1 comments

Bug Description

I got all the necessary files provided in this Nvidia TensorRT-LLM documantation

Like this in my dir:

app.py model/ ├── .gitkeep ├── config.json ├── generation_config.json ├── LICENSE.txt ├── llama_float16_tp1_rank0.engine ├── model.cache ├── model.safetensors.index.json ├── model-00001-of-00003.safetensors ├── model-00002-of-00003.safetensors ├── model-00003-of-00003.safetensors ├── pytorch_model.bin.index.json ├── pytorch_model-00001-of-00003.bin ├── pytorch_model-00002-of-00003.bin ├── pytorch_model-00003-of-00003.bin ├── README.md ├── Responsible-Use-Guide.pdf ├── special_tokens_map.json ├── tokenizer.json ├── tokenizer.model ├── tokenizer_config.json └── USE_POLICY.md

All in a venv with a working llama index.

Cuda, cuDNN etc is installed.

Operating system is WSL 2.0 Ubuntu 22.04 in Windows 10

`import logging import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.llms import LocalTensorRTLLM

def completion_to_prompt(completion: str) -> str: """ Given a completion, return the prompt using llama2 format. """ return f" [INST] {completion} [/INST] "

llm = LocalTensorRTLLM( model_path="./model", engine_name="llama_float16_tp1_rank0.engine", tokenizer_dir="meta-llama/Llama-2-13b-chat", completion_to_prompt=completion_to_prompt, verbose=True )

resp = llm.complete("Who is Paul Graham?") print(str(resp)) `

Version

0.9.43

Steps to Reproduce

Running the code with all necessary libraries installed.

Relevant Logs/Tracbacks

No response

teis-e avatar Feb 04 '24 18:02 teis-e

🤖

Hey @teis-e! Great to see you back. Hope you're doing well!

Based on the information you've provided, it's difficult to pinpoint the exact issue as there are no error messages or logs. However, I can suggest a few things to check:

  1. Model File Existence: Ensure that the model file exists at the specified path. The LocalTensorRTLLM class checks if the model file exists at the provided path and raises a ValueError if it does not.

  2. TensorRT-LLM Library: Make sure that the TensorRT-LLM library is installed and can be imported correctly. The code tries to import the tensorrt_llm library and raises an ImportError if it fails.

  3. CUDA Availability: The code requires a CUDA-enabled GPU to run. It checks if CUDA is available and raises an AssertionError if it is not. Please ensure that a CUDA-enabled GPU is available and the necessary drivers are installed.

  4. World Size Matching: The code checks if the world size of the engine matches the runtime world size and raises an AssertionError if they do not match. Please ensure that the world sizes match.

  5. Pipeline Parallelism: The code does not support pipeline parallelism. It raises an AssertionError if the pipeline parallelism size is not 1. Please ensure that the pipeline parallelism size is 1.

  6. Engine File Opening: The code tries to open the engine file and raises an error if it fails. Please ensure that the engine file can be opened correctly.

Regarding the model_path and engine_name parameters in the LocalTensorRTLLM class, model_path should be a directory containing the TensorRT engine and its configuration, and engine_name (if provided) should be the name of the serialized TensorRT engine file within that directory.

If you're still encountering issues, could you please provide more specific error messages or logs? This would help in diagnosing the problem more accurately.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Feb 04 '24 18:02 dosubot[bot]