trt-llm-rag-windows Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match.

When I run the command to start the application, I get the version mismatch error:

The command:

python app.py --trt_engine_path model/ --trt_engine_name llama_float16_tp1_rank0.engine --tokenizer_dir_path Llama-2-13b-chat-hf --data_dir dataset/

The error:

Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 228, Serialized Engine Version: 226)
Traceback (most recent call last):
  File "C:\Users\unubi\trt-llm-rag-windows\app.py", line 63, in <module>
    llm = TrtLlmAPI(
  File "C:\Users\unubi\trt-llm-rag-windows\trt_llama_api.py", line 166, in __init__
    decoder = tensorrt_llm.runtime.GenerationSession(self._model_config,
  File "C:\Users\unubi\anaconda3\envs\myenv\lib\site-packages\tensorrt_llm\runtime\generation.py", line 457, in __init__
    self.runtime = _Runtime(engine_buffer, mapping)
  File "C:\Users\unubi\anaconda3\envs\myenv\lib\site-packages\tensorrt_llm\runtime\generation.py", line 150, in __init__
    self.__prepare(mapping, engine_buffer)
  File "C:\Users\unubi\anaconda3\envs\myenv\lib\site-packages\tensorrt_llm\runtime\generation.py", line 168, in __prepare
    assert self.engine is not None
AssertionError
Exception ignored in: <function _Runtime.__del__ at 0x000001F6C370CA60>
Traceback (most recent call last):
  File "C:\Users\unubi\anaconda3\envs\myenv\lib\site-packages\tensorrt_llm\runtime\generation.py", line 266, in __del__
    cudart.cudaFree(self.address)  # FIXME: cudaFree is None??
AttributeError: '_Runtime' object has no attribute 'address'

Jan 27 '24 17:01 MustaphaU

Got same issue with pre-built 4090 engine.

Feb 01 '24 14:02 teis-e

Hello, which TRT-LLM version are you using to build the engine? For compatibility, the engine should be built with TRT-LLM version 0.5

Feb 12 '24 12:02 shishirgoyal85

@MustaphaU and @teis-e : Can you please share the command used to install the wheel?

Feb 12 '24 12:02 anujj

@shishirgoyal85 @anujj Installing tensorrt_llm version 0.5 i.e. pip install tensorrt_llm==0.5 --extra-index-url https://pypi.nvidia.com/ --extra-index-url https://download.pytorch.org/whl/cu121 results in the error:

ERROR: No matching distribution found for torch==2.1.0.dev20230828+cu121

Here is the full log:

Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com, https://download.pytorch.org/whl/cu121
Collecting tensorrt_llm==0.5
  Using cached https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-0.5.0-0-cp310-cp310-win_amd64.whl (431.5 MB)
Collecting build (from tensorrt_llm==0.5)
  Using cached build-1.0.3-py3-none-any.whl.metadata (4.2 kB)
INFO: pip is looking at multiple versions of tensorrt-llm to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement torch==2.1.0.dev20230828+cu121 (from tensorrt-llm) (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.0+cu121, 2.1.1, 2.1.1+cu121, 2.1.2, 2.1.2+cu121, 2.2.0, 2.2.0+cu121)
ERROR: No matching distribution found for torch==2.1.0.dev20230828+cu121

Feb 12 '24 15:02 MustaphaU

@MustaphaU : The below command worked for me for trt-llm 0.5v

pip install tensorrt-llm==0.5.0.post1 --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/nightly/cu121 --extra-index-url https://download.pytorch.org/whl/cu121

Feb 12 '24 15:02 anujj

we just release a updated version 0.3 . Please use that branch and follow readme: https://github.com/NVIDIA/ChatRTX/blob/release/0.3/README.md to setup the application

May 23 '24 09:05 anujj

trt-llm-rag-windows trt-llm-rag-windows copied to clipboard

Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match.

trt-llm-rag-windows
trt-llm-rag-windows copied to clipboard