TensorRT-LLM
TensorRT-LLM copied to clipboard
UnboundLocalError: local variable 'groupwise_qweight_safetensors' referenced before assignment
python build.py --hf_model_dir Qwen-7B-Chat \
> --quant_ckpt_path ./qwen_7b_4bit_gs128_awq.pt \
> --dtype float16 \
> --remove_input_padding \
> --use_gpt_attention_plugin float16 \
> --enable_context_fmha \
> --use_gemm_plugin float16 \
> --use_weight_only \
> --weight_only_precision int4_awq \
> --per_group \
> --world_size 1 \
> --tp_size 1 \
> --output_dir ./tmp/Qwen/7B/trt_engines/int4-awq/1-gpu
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600[02/08/2024-14:19:14] [TRT-LLM] [I] Serially build TensorRT engines.
[02/08/2024-14:19:14] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 119, GPU 1256 (MiB)
[02/08/2024-14:19:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1800, GPU +312, now: CPU 2055, GPU 1568 (MiB)
[02/08/2024-14:19:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[02/08/2024-14:19:16] [TRT-LLM] [I] Loading weights from groupwise AWQ Qwen safetensors...
Loading weights...: 100%|███████████████████████████████████████████████████████████████| 32/32 [00:45<00:00, 1.43s/it]
[02/08/2024-14:20:07] [TRT-LLM] [I] Weights loaded. Total time: 00:00:51
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/qwen/build.py", line 705, in <module>
build(0, args)
File "/app/tensorrt_llm/examples/qwen/build.py", line 675, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/app/tensorrt_llm/examples/qwen/build.py", line 498, in build_rank_engine
load_from_awq_qwen(tensorrt_llm_qwen=tensorrt_llm_qwen,
File "/app/tensorrt_llm/examples/qwen/weight.py", line 1035, in load_from_awq_qwen
del groupwise_qweight_safetensors
UnboundLocalError: local variable 'groupwise_qweight_safetensors' referenced before assignment
It seems there's bug here (I assume you're using main branch).
A quick war is to comment line 1035 where del groupwise_qweight_safetensors
, the rootcause is you're using pt format file which make groupwise_qweight_safetensors
was not assigned yet.