ChatGLM-Efficient-Tuning icon indicating copy to clipboard operation
ChatGLM-Efficient-Tuning copied to clipboard

使用lora微调GLM2,加载模型报错

Open ykf173 opened this issue 1 year ago • 10 comments

Generation config file not found, using a generation config created from the model config.
07/07/2023 16:36:35 - INFO - utils.common - Fine-tuning method: LoRA
Traceback (most recent call last):
 File "……/ChatGLM-Efficient-Tuning/src/train_sft.py", line 105, in <module>
   main()
 File "……/ChatGLM-Efficient-Tuning/src/train_sft.py", line 25, in main
   model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="sft")
 File "……/ChatGLM-Efficient-Tuning/src/utils/common.py", line 244, in load_pretrained
   model = init_adapter(model, model_args, finetuning_args, is_trainable)
 File "……/ChatGLM-Efficient-Tuning/src/utils/common.py", line 117, in init_adapter
   model = PeftModel.from_pretrained(model, checkpoint)
 File "……/miniconda3/envs/glm_tuning/lib/python3.10/site-packages/peft/peft_model.py", line 181, in from_pretrained
   model.load_adapter(model_id, adapter_name, **kwargs)
 File "……/miniconda3/envs/glm_tuning/lib/python3.10/site-packages/peft/peft_model.py", line 376, in load_adapter
   set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
 File "……/miniconda3/envs/glm_tuning/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 123, in set_peft_model_state_dict
   model.load_state_dict(peft_model_state_dict, strict=False)
 File "……/miniconda3/envs/glm_tuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
   raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
       size mismatch for base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       size mismatch for base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
       size mismatch for base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
       ```

ykf173 avatar Jul 07 '23 08:07 ykf173

训练脚本:

CUDA_VISIBLE_DEVICES=2,3 accelerate launch src/train_sft.py \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 64 \
    --output_dir glm/path_to_sft_checkpoint_glm2_6b_7_7 \
    --per_device_train_batch_size 64 \
    --gradient_accumulation_steps 1 \
    --lr_scheduler_type cosine \
    --logging_steps 100 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 2 \
    --fp16 \
    --use_v2 true \
    --pretrain_model/chatglm2-6b \

评测脚本:

CUDA_VISIBLE_DEVICES=3 python src/train_sft.py \
    --model_name_or_path pretrain_model/chatglm2-6b \
    --do_eval \
    --use_v2 \
    --finetuning_type lora \
    --lora_rank 64 \
    --fp16 \
    --dataset alpaca_gpt4_zh \
    --checkpoint_dir glm/path_to_sft_checkpoint_glm2_6b_7_7 \
    --output_dir glm/path_to_sft_checkpoint_glm2_6b_7_7/res \
    --per_device_eval_batch_size 64 \
    --predict_with_generate 

ykf173 avatar Jul 07 '23 08:07 ykf173

@ykf173 --lora_rank 64不太行,我也是这样的报错,--lora_rank 16可以,--lora_rank 32,64都不行

happy-xlf avatar Jul 08 '23 04:07 happy-xlf

@ykf173 --lora_rank 64不太行,我也是这样的报错,--lora_rank 16可以,--lora_rank 32,64都不行

可行可行,那如果想调大--lora_rank 是得改代码了

ykf173 avatar Jul 08 '23 10:07 ykf173

我在本地机器上测试了没有问题,我的测试参数是:

#!/bin/bash

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path chatglm2 \
    --use_v2 \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 32 \
    --output_dir out/debug_sft_v2 \
    --overwrite_cache \
    --overwrite_output_dir \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate 1e-4 \
    --num_train_epochs 1.0 \
    --max_samples 1000 \
    --plot_loss \
    --fp16

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path chatglm2 \
    --use_v2 \
    --do_predict \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --checkpoint_dir out/debug_sft_v2 \
    --output_dir out/debug_sft_pred_v2 \
    --overwrite_cache \
    --overwrite_output_dir \
    --max_samples 30 \
    --per_device_eval_batch_size 8 \
    --predict_with_generate \
    --fp16

测试环境是 V100 32G * 1,issue 中的问题可能是因为显存爆了

hiyouga avatar Jul 08 '23 10:07 hiyouga

我在本地机器上测试了没有问题,我的测试参数是:

#!/bin/bash

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path chatglm2 \
    --use_v2 \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 32 \
    --output_dir out/debug_sft_v2 \
    --overwrite_cache \
    --overwrite_output_dir \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate 1e-4 \
    --num_train_epochs 1.0 \
    --max_samples 1000 \
    --plot_loss \
    --fp16

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path chatglm2 \
    --use_v2 \
    --do_predict \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --checkpoint_dir out/debug_sft_v2 \
    --output_dir out/debug_sft_pred_v2 \
    --overwrite_cache \
    --overwrite_output_dir \
    --max_samples 30 \
    --per_device_eval_batch_size 8 \
    --predict_with_generate \
    --fp16

测试环境是 V100 32G * 1,issue 中的问题可能是因为显存爆了

它可以运行出对应的lora文件,但是文件加载会报错,而且生成的adapter_model.bin大小一直是16.34KB,没爆显存,我存着log文件看了,是全部运行结束的,然后去加载的lora模型,发现无论是lora_rank =32/64,生成的adapter_model.bin大小一直是16.34KB,而且加载报错,报错和一楼一样

happy-xlf avatar Jul 09 '23 02:07 happy-xlf

@happy-xlf 这个文件大小明显有问题

hiyouga avatar Jul 09 '23 02:07 hiyouga

@hiyouga 对呀,您调整lora_rank=32/64生成的文件大小是正常的嘛?

happy-xlf avatar Jul 09 '23 02:07 happy-xlf

@happy-xlf 是的,我这边文件大小正常

hiyouga avatar Jul 09 '23 02:07 hiyouga

@happy-xlf 是的,我这边文件大小正常

有没有可能是多卡的问题,我这边用了4张a100sft,1张测试推理 结果和@happy-xlf 说的一样,模型17kb,tmux回去看了日志也是完整的

ykf173 avatar Jul 10 '23 01:07 ykf173

@ykf173 我用的V100,单机8卡,跑出来的就是17kb,可能是accelerate问题?

happy-xlf avatar Jul 10 '23 02:07 happy-xlf