trt-llm-rag-windows
trt-llm-rag-windows copied to clipboard
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'
The bindings module is missingit seems. This results in an error when I run the command to build the TRT engine based on the instruction in the readme
For RTX 4090 (TensorRT 9.1.0.4 & TensorRT-LLM 0.5.0), a prebuilt TRT engine is provided. For other RTX GPUs or TensorRT versions, follow these steps to build your TRT engine:
Download LLaMa 2 13B chat model from https://huggingface.co/meta-llama/Llama-2-13b-chat-hf
Download LLaMa 2 13B AWQ int4 checkpoints model.pt from here
Clone the TensorRT LLM repository:
git clone https://github.com/NVIDIA/TensorRT-LLM.git
Navigate to the examples\llama directory and run the following script:
python build.py --model_dir <path to llama13_chat model> --quant_ckpt_path <path to model.pt> --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3000 --max_output_len 1024 --output_dir <TRT engine folder>
Here is my custom build command:
python build.py --model_dir C:\Users\unubi\trt-llm-rag-windows\Llama-2-13b-chat-hf --quant_ckpt_path C:\Users\unubi\trt-llm-rag-windows\checkpoint\model.pt --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_on ly_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3000 --max_output_len 1024 --output_dir C:\Users\unubi\trt-llm-rag-windows\engine
Here is the full error log:
Traceback (most recent call last):
File "C:\Users\unubi\trt-llm-rag-windows\TensorRT-LLM\examples\llama\build.py", line 39, in <module>
import tensorrt_llm
File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\__init__.py", line 29, in <module>
from .hlapi.llm import LLM, ModelConfig
File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\hlapi\__init__.py", line 1, in <module>
from .llm import LLM, ModelConfig
File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\hlapi\llm.py", line 18, in <module>
from ..executor import GenerationExecutor, GenerationResult
File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\executor.py", line 10, in <module>
import tensorrt_llm.bindings as tllm
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'
This git repo needs trt-llm version 0.5 to build trt engines. Please use below command to install the wheel pip install tensorrt_llm==0.5 --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121
Please use 0.5v source code to build the engine: https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.5.0
@anujj Installing tensorrt_llm version 0.5 i.e. pip install tensorrt_llm==0.5 --extra-index-url https://pypi.nvidia.com/ --extra-index-url https://download.pytorch.org/whl/cu121
results in the error:
ERROR: No matching distribution found for torch==2.1.0.dev20230828+cu121
Here is the full log:
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com, https://download.pytorch.org/whl/cu121
Collecting tensorrt_llm==0.5
Using cached https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-0.5.0-0-cp310-cp310-win_amd64.whl (431.5 MB)
Collecting build (from tensorrt_llm==0.5)
Using cached build-1.0.3-py3-none-any.whl.metadata (4.2 kB)
INFO: pip is looking at multiple versions of tensorrt-llm to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement torch==2.1.0.dev20230828+cu121 (from tensorrt-llm) (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.0+cu121, 2.1.1, 2.1.1+cu121, 2.1.2, 2.1.2+cu121, 2.2.0, 2.2.0+cu121)
ERROR: No matching distribution found for torch==2.1.0.dev20230828+cu121
@MustaphaU Try using 0.5.0.post1, this should resolve error with finding torch version.
pip install tensorrt_llm==0.5.0.post1 --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121
@miduthuruk, if possible, can try to just pull tensorrt bindings like below, make sure to match version exactly with tensorrt installed version.
pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com tensorrt_bindings==9.2.0.post12.dev5
Resolved. Thanks @BLSharda @anujj . Will close the issue now.