Qwen Size mismatch, Error(s) in loading model finetuned by lora

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

I finetune the QWen-7B, and when I use the finetuned model, I met some error:

root@1fc7d6985d8b:/Fine/Qwen-main# python3 cli_demo.py
/usr/local/lib/python3.8/dist-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:10<00:00,  1.30s/it]
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 151851. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
Traceback (most recent call last):
  File "cli_demo.py", line 217, in <module>
    main()
  File "cli_demo.py", line 123, in main
    model, tokenizer, config = _load_model_tokenizer(args)
  File "cli_demo.py", line 60, in _load_model_tokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/peft/auto.py", line 128, in from_pretrained
    return cls._target_peft_class.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 353, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 697, in load_adapter
    load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
  File "/usr/local/lib/python3.8/dist-packages/peft/utils/save_and_load.py", line 249, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151851, 4096]).
        size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151851, 4096]).

I finetuned the model by use:

bash finetune_lora_single_gpu.sh -d my_train_data.json

and I modified my cli-demo.py follow the tutorial:

    model = AutoPeftModelForCausalLM.from_pretrained(
        model_path, # path to the output directory or model name
        device_map=device_map,
        trust_remote_code=True,
    ).eval()

I found some probably related issues like #419 #482 , but they can't solve my problem.

期望行为 | Expected Behavior

demo runs normally.

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: 22.04.1-Ubuntu
- Python: 3.8
- Transformers:4.32.0
- PyTorch:2.2.1
- CUDA 12.1

备注 | Anything else?

No response

Mar 09 '24 14:03 GuYith

Something seems wrong with the vocab_size (which is the size of the embedding, not the actual vocabulary size) in config.json and the pad_to_multiple_of setting.

Please try upgrade transformers<4.38.0 and downgrade peft<0.8.0 first and provide the content of config.json.

Mar 11 '24 04:03 jklj077

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0.

config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

Mar 11 '24 05:03 GuYith

There should be an adapter_config.json as well. Let's see what's there. I think peft is changing the vocab_size somewhere.

Mar 11 '24 07:03 jklj077

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. 此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决，请在此帖下方留言以补充信息。

Apr 20 '24 08:04 github-actions[bot]

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0.

config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

Hi, have you solved this problem?

Apr 25 '24 02:04 Aurora-slz

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0. config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

Hi, have you solved this problem?

I'm sorry I didn't follow up on this issue.

Apr 25 '24 06:04 GuYith

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. 此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决，请在此帖下方留言以补充信息。

Jun 01 '24 08:06 github-actions[bot]

Qwen Qwen copied to clipboard

Size mismatch, Error(s) in loading model finetuned by lora

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Qwen
Qwen copied to clipboard