trt-llm-rag-windows icon indicating copy to clipboard operation
trt-llm-rag-windows copied to clipboard

ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

Open MustaphaU opened this issue 1 year ago • 3 comments

The bindings module is missingit seems. This results in an error when I run the command to build the TRT engine based on the instruction in the readme

For RTX 4090 (TensorRT 9.1.0.4 & TensorRT-LLM 0.5.0), a prebuilt TRT engine is provided. For other RTX GPUs or TensorRT versions, follow these steps to build your TRT engine:

Download LLaMa 2 13B chat model from https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

Download LLaMa 2 13B AWQ int4 checkpoints model.pt from here

Clone the TensorRT LLM repository:

git clone https://github.com/NVIDIA/TensorRT-LLM.git

Navigate to the examples\llama directory and run the following script:

python build.py --model_dir <path to llama13_chat model> --quant_ckpt_path <path to model.pt> --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3000 --max_output_len 1024 --output_dir <TRT engine folder>

Here is my custom build command:

python build.py --model_dir C:\Users\unubi\trt-llm-rag-windows\Llama-2-13b-chat-hf --quant_ckpt_path C:\Users\unubi\trt-llm-rag-windows\checkpoint\model.pt --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_on ly_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3000 --max_output_len 1024 --output_dir C:\Users\unubi\trt-llm-rag-windows\engine

Here is the full error log:


Traceback (most recent call last):
  File "C:\Users\unubi\trt-llm-rag-windows\TensorRT-LLM\examples\llama\build.py", line 39, in <module>
    import tensorrt_llm
  File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\__init__.py", line 29, in <module>
    from .hlapi.llm import LLM, ModelConfig
  File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\hlapi\__init__.py", line 1, in <module>     
    from .llm import LLM, ModelConfig
  File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\hlapi\llm.py", line 18, in <module>
    from ..executor import GenerationExecutor, GenerationResult
  File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\executor.py", line 10, in <module>
    import tensorrt_llm.bindings as tllm
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

MustaphaU avatar Jan 27 '24 16:01 MustaphaU

This git repo needs trt-llm version 0.5 to build trt engines. Please use below command to install the wheel pip install tensorrt_llm==0.5 --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121

Please use 0.5v source code to build the engine: https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.5.0

anujj avatar Feb 12 '24 12:02 anujj

@anujj Installing tensorrt_llm version 0.5 i.e. pip install tensorrt_llm==0.5 --extra-index-url https://pypi.nvidia.com/ --extra-index-url https://download.pytorch.org/whl/cu121 results in the error:

ERROR: No matching distribution found for torch==2.1.0.dev20230828+cu121

Here is the full log:

Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com, https://download.pytorch.org/whl/cu121
Collecting tensorrt_llm==0.5
  Using cached https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-0.5.0-0-cp310-cp310-win_amd64.whl (431.5 MB)
Collecting build (from tensorrt_llm==0.5)
  Using cached build-1.0.3-py3-none-any.whl.metadata (4.2 kB)
INFO: pip is looking at multiple versions of tensorrt-llm to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement torch==2.1.0.dev20230828+cu121 (from tensorrt-llm) (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.0+cu121, 2.1.1, 2.1.1+cu121, 2.1.2, 2.1.2+cu121, 2.2.0, 2.2.0+cu121)
ERROR: No matching distribution found for torch==2.1.0.dev20230828+cu121

MustaphaU avatar Feb 12 '24 15:02 MustaphaU

@MustaphaU Try using 0.5.0.post1, this should resolve error with finding torch version.

pip install tensorrt_llm==0.5.0.post1 --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121

BLSharda avatar Feb 15 '24 07:02 BLSharda

@miduthuruk, if possible, can try to just pull tensorrt bindings like below, make sure to match version exactly with tensorrt installed version.

pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com tensorrt_bindings==9.2.0.post12.dev5

BLSharda avatar Feb 19 '24 11:02 BLSharda

Resolved. Thanks @BLSharda @anujj . Will close the issue now.

MustaphaU avatar Feb 27 '24 13:02 MustaphaU