lightllm icon indicating copy to clipboard operation
lightllm copied to clipboard

no attribute 'qkv_weight_' AttributeError when load Qwen-14B-Chat-Int4

Open jarviszeng-zjc opened this issue 1 year ago • 8 comments

Traceback (most recent call last):
  File "/data/miniconda3/envs/lightllm/lib/python3.10/site-packages/rpyc/core/protocol.py", line 359, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/data/miniconda3/envs/lightllm/lib/python3.10/site-packages/rpyc/core/protocol.py", line 837, in _handle_call
    return obj(*args, **dict(kwargs))
  File "/data/code/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 116, in exposed_init_model
    raise e
  File "/data/code/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 82, in exposed_init_model
    self.model = QWenTpPartModelWQuant(model_kvargs)
  File "/data/code/lightllm/lightllm/models/qwen_wquant/model.py", line 17, in __init__
    super().__init__(kvargs)
  File "/data/code/lightllm/lightllm/models/qwen/model.py", line 27, in __init__
    super().__init__(kvargs)
  File "/data/code/lightllm/lightllm/models/llama/model.py", line 31, in __init__
    super().__init__(kvargs)
  File "/data/code/lightllm/lightllm/common/basemodel/basemodel.py", line 44, in __init__
    self._init_weights()
  File "/data/code/lightllm/lightllm/models/llama/model.py", line 93, in _init_weights
    [weight.verify_load() for weight in self.trans_layers_weight]
  File "/data/code/lightllm/lightllm/models/llama/model.py", line 93, in <listcomp>
    [weight.verify_load() for weight in self.trans_layers_weight]
  File "/data/code/lightllm/lightllm/models/qwen_wquant/layer_weights/transformer_layer_weight.py", line 86, in verify_load
    self.qkv_weight_,
AttributeError: 'QwenTransformerLayerWeightQuantized' object has no attribute 'qkv_weight_'

quantize_config.json

{
  "bits": 4,
  "group_size": 128,
  "damp_percent": 0.01,
  "desc_act": false,
  "static_groups": false,
  "sym": true,
  "true_sequential": true,
  "model_name_or_path": null,
  "model_file_base_name": "model"
}

jarviszeng-zjc avatar Dec 01 '23 01:12 jarviszeng-zjc

Thank you for your attention. Could you provide more details, such the startup parameters?

shihaobai avatar Dec 01 '23 07:12 shihaobai

@shihaobai Thank you, here are some detailed information:

python -m lightllm.server.api_server --model_dir /data/models/qwen/Qwen-14B-Chat-Int4 --trust_remote_code --max_total_token_num 3000 --max_req_input_len 2048 --max_req_total_len 2100 --tokenizer_mode auto --disable_log_stats --tp 2 --mode ppl_int4weight
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:0D.0 Off |                    0 |
| N/A   37C    P0              26W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       Off | 00000000:00:0E.0 Off |                    0 |
| N/A   38C    P0              26W /  70W |      2MiB / 15360MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

jarviszeng-zjc avatar Dec 01 '23 08:12 jarviszeng-zjc

You can try the --mode triton_int4weight. We have not yet open-sourced the ppl kernel, and you can check whether the key exists in your weight file: "transformer.h.{self.layer_num_}.attn.c_attn.weight"

shihaobai avatar Dec 01 '23 08:12 shihaobai

@shihaobai It’s the same error, haven’t reached the code to distinguish modes yet.

jarviszeng-zjc avatar Dec 03 '23 09:12 jarviszeng-zjc

Did you check whether the key exists in your weight file: 'transformer.h.{self.layer_num_}.attn.c_attn.weight'?"

shihaobai avatar Dec 04 '23 06:12 shihaobai

Qwen-14B-Chat-Int4 weights has not been supported yet.

hiworldwzj avatar Dec 04 '23 09:12 hiworldwzj

Qwen-14B-Chat-Int4 weights has not been supported yet.

👌🏻

jarviszeng-zjc avatar Dec 05 '23 15:12 jarviszeng-zjc

Qwen-14B-Chat-Int4 weights has not been supported yet.

qwen目前支持什么类型的量化?

mafamily2496 avatar Dec 17 '23 06:12 mafamily2496