trt-llm-rag-windows
trt-llm-rag-windows copied to clipboard
AttributeError: 'WeightOnlyGroupwiseQuantLinear' object has no attribute 'prequant_scaling_factor'
Receiving the above error when attempting to build the TRT engine.
Using a 3090 with driver 546.33, CUDA 12.3 and tensorrt_llm-0.7.1
Traceback (most recent call last):
File "Z:\oracle\model\TensorRT-LLM\examples\llama\build.py", line 983, in
Hello, can you please share the build.py command used for engine generation?
we just release a updated version 0.3 . Please use that branch and follow readme: https://github.com/NVIDIA/ChatRTX/blob/release/0.3/README.md to setup the application