ChatGLM-Efficient-Tuning
ChatGLM-Efficient-Tuning copied to clipboard
使用lora微调GLM2,加载模型报错
Generation config file not found, using a generation config created from the model config.
07/07/2023 16:36:35 - INFO - utils.common - Fine-tuning method: LoRA
Traceback (most recent call last):
File "……/ChatGLM-Efficient-Tuning/src/train_sft.py", line 105, in <module>
main()
File "……/ChatGLM-Efficient-Tuning/src/train_sft.py", line 25, in main
model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="sft")
File "……/ChatGLM-Efficient-Tuning/src/utils/common.py", line 244, in load_pretrained
model = init_adapter(model, model_args, finetuning_args, is_trainable)
File "……/ChatGLM-Efficient-Tuning/src/utils/common.py", line 117, in init_adapter
model = PeftModel.from_pretrained(model, checkpoint)
File "……/miniconda3/envs/glm_tuning/lib/python3.10/site-packages/peft/peft_model.py", line 181, in from_pretrained
model.load_adapter(model_id, adapter_name, **kwargs)
File "……/miniconda3/envs/glm_tuning/lib/python3.10/site-packages/peft/peft_model.py", line 376, in load_adapter
set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
File "……/miniconda3/envs/glm_tuning/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 123, in set_peft_model_state_dict
model.load_state_dict(peft_model_state_dict, strict=False)
File "……/miniconda3/envs/glm_tuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
size mismatch for base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([64, 4096]).
size mismatch for base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4608, 64]).
```
训练脚本:
CUDA_VISIBLE_DEVICES=2,3 accelerate launch src/train_sft.py \
--do_train \
--dataset alpaca_gpt4_zh \
--finetuning_type lora \
--lora_rank 64 \
--output_dir glm/path_to_sft_checkpoint_glm2_6b_7_7 \
--per_device_train_batch_size 64 \
--gradient_accumulation_steps 1 \
--lr_scheduler_type cosine \
--logging_steps 100 \
--save_steps 1000 \
--learning_rate 5e-5 \
--num_train_epochs 2 \
--fp16 \
--use_v2 true \
--pretrain_model/chatglm2-6b \
评测脚本:
CUDA_VISIBLE_DEVICES=3 python src/train_sft.py \
--model_name_or_path pretrain_model/chatglm2-6b \
--do_eval \
--use_v2 \
--finetuning_type lora \
--lora_rank 64 \
--fp16 \
--dataset alpaca_gpt4_zh \
--checkpoint_dir glm/path_to_sft_checkpoint_glm2_6b_7_7 \
--output_dir glm/path_to_sft_checkpoint_glm2_6b_7_7/res \
--per_device_eval_batch_size 64 \
--predict_with_generate
@ykf173 --lora_rank 64不太行,我也是这样的报错,--lora_rank 16可以,--lora_rank 32,64都不行
@ykf173 --lora_rank 64不太行,我也是这样的报错,--lora_rank 16可以,--lora_rank 32,64都不行
可行可行,那如果想调大--lora_rank 是得改代码了
我在本地机器上测试了没有问题,我的测试参数是:
#!/bin/bash
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
--model_name_or_path chatglm2 \
--use_v2 \
--do_train \
--dataset alpaca_gpt4_zh \
--finetuning_type lora \
--lora_rank 32 \
--output_dir out/debug_sft_v2 \
--overwrite_cache \
--overwrite_output_dir \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 100 \
--learning_rate 1e-4 \
--num_train_epochs 1.0 \
--max_samples 1000 \
--plot_loss \
--fp16
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
--model_name_or_path chatglm2 \
--use_v2 \
--do_predict \
--dataset alpaca_gpt4_zh \
--finetuning_type lora \
--checkpoint_dir out/debug_sft_v2 \
--output_dir out/debug_sft_pred_v2 \
--overwrite_cache \
--overwrite_output_dir \
--max_samples 30 \
--per_device_eval_batch_size 8 \
--predict_with_generate \
--fp16
测试环境是 V100 32G * 1,issue 中的问题可能是因为显存爆了
我在本地机器上测试了没有问题,我的测试参数是:
#!/bin/bash CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \ --model_name_or_path chatglm2 \ --use_v2 \ --do_train \ --dataset alpaca_gpt4_zh \ --finetuning_type lora \ --lora_rank 32 \ --output_dir out/debug_sft_v2 \ --overwrite_cache \ --overwrite_output_dir \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 100 \ --learning_rate 1e-4 \ --num_train_epochs 1.0 \ --max_samples 1000 \ --plot_loss \ --fp16 CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \ --model_name_or_path chatglm2 \ --use_v2 \ --do_predict \ --dataset alpaca_gpt4_zh \ --finetuning_type lora \ --checkpoint_dir out/debug_sft_v2 \ --output_dir out/debug_sft_pred_v2 \ --overwrite_cache \ --overwrite_output_dir \ --max_samples 30 \ --per_device_eval_batch_size 8 \ --predict_with_generate \ --fp16
测试环境是 V100 32G * 1,issue 中的问题可能是因为显存爆了
它可以运行出对应的lora文件,但是文件加载会报错,而且生成的adapter_model.bin大小一直是16.34KB,没爆显存,我存着log文件看了,是全部运行结束的,然后去加载的lora模型,发现无论是lora_rank =32/64,生成的adapter_model.bin大小一直是16.34KB,而且加载报错,报错和一楼一样
@happy-xlf 这个文件大小明显有问题
@hiyouga 对呀,您调整lora_rank=32/64生成的文件大小是正常的嘛?
@happy-xlf 是的,我这边文件大小正常
@happy-xlf 是的,我这边文件大小正常
有没有可能是多卡的问题,我这边用了4张a100sft,1张测试推理 结果和@happy-xlf 说的一样,模型17kb,tmux回去看了日志也是完整的
@ykf173 我用的V100,单机8卡,跑出来的就是17kb,可能是accelerate问题?