lightllm
lightllm copied to clipboard
no attribute 'qkv_weight_' AttributeError when load Qwen-14B-Chat-Int4
Traceback (most recent call last):
File "/data/miniconda3/envs/lightllm/lib/python3.10/site-packages/rpyc/core/protocol.py", line 359, in _dispatch_request
res = self._HANDLERS[handler](self, *args)
File "/data/miniconda3/envs/lightllm/lib/python3.10/site-packages/rpyc/core/protocol.py", line 837, in _handle_call
return obj(*args, **dict(kwargs))
File "/data/code/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 116, in exposed_init_model
raise e
File "/data/code/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 82, in exposed_init_model
self.model = QWenTpPartModelWQuant(model_kvargs)
File "/data/code/lightllm/lightllm/models/qwen_wquant/model.py", line 17, in __init__
super().__init__(kvargs)
File "/data/code/lightllm/lightllm/models/qwen/model.py", line 27, in __init__
super().__init__(kvargs)
File "/data/code/lightllm/lightllm/models/llama/model.py", line 31, in __init__
super().__init__(kvargs)
File "/data/code/lightllm/lightllm/common/basemodel/basemodel.py", line 44, in __init__
self._init_weights()
File "/data/code/lightllm/lightllm/models/llama/model.py", line 93, in _init_weights
[weight.verify_load() for weight in self.trans_layers_weight]
File "/data/code/lightllm/lightllm/models/llama/model.py", line 93, in <listcomp>
[weight.verify_load() for weight in self.trans_layers_weight]
File "/data/code/lightllm/lightllm/models/qwen_wquant/layer_weights/transformer_layer_weight.py", line 86, in verify_load
self.qkv_weight_,
AttributeError: 'QwenTransformerLayerWeightQuantized' object has no attribute 'qkv_weight_'
quantize_config.json
{
"bits": 4,
"group_size": 128,
"damp_percent": 0.01,
"desc_act": false,
"static_groups": false,
"sym": true,
"true_sequential": true,
"model_name_or_path": null,
"model_file_base_name": "model"
}
Thank you for your attention. Could you provide more details, such the startup parameters?
@shihaobai Thank you, here are some detailed information:
python -m lightllm.server.api_server --model_dir /data/models/qwen/Qwen-14B-Chat-Int4 --trust_remote_code --max_total_token_num 3000 --max_req_input_len 2048 --max_req_total_len 2100 --tokenizer_mode auto --disable_log_stats --tp 2 --mode ppl_int4weight
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:0D.0 Off | 0 |
| N/A 37C P0 26W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:0E.0 Off | 0 |
| N/A 38C P0 26W / 70W | 2MiB / 15360MiB | 4% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
You can try the --mode triton_int4weight. We have not yet open-sourced the ppl kernel, and you can check whether the key exists in your weight file: "transformer.h.{self.layer_num_}.attn.c_attn.weight"
@shihaobai It’s the same error, haven’t reached the code to distinguish modes yet.
Did you check whether the key exists in your weight file: 'transformer.h.{self.layer_num_}.attn.c_attn.weight'?"
Qwen-14B-Chat-Int4 weights has not been supported yet.
Qwen-14B-Chat-Int4 weights has not been supported yet.
👌🏻
Qwen-14B-Chat-Int4 weights has not been supported yet.
qwen目前支持什么类型的量化?