SFT后模型合并Lora权重,回答质量下降明显(非Issue #2505,#4913)
Reminder
- [X] I have read the README and searched the existing issues.
System Info
-
llamafactoryversion: 0.9.1.dev0 - Platform: Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.35
- Python version: 3.11.9
- PyTorch version: 2.4.0 (GPU)
- Transformers version: 4.43.4
- Datasets version: 3.0.2
- Accelerate version: 1.0.1
- PEFT version: 0.12.0
- TRL version: 0.9.6
- GPU type: NVIDIA A800 80GB PCIe
- DeepSpeed version: 0.14.4
- Bitsandbytes version: 0.43.1
- vLLM version: 0.5.4
Reproduction
我遇到了和Issue#2505,#4913同样的问题,但在排查后排除了以上两个Issue中的原因。
合并前测试样例(使用SFT数据进行测试,输出标准化的Json数据):
合并后测试样例(胡乱输出):
以下提供微调、合并和测试的脚本以及参数:
- 微调脚本和参数
llamafactory-cli train examples/train_lora/gemma2_lora_sft_causality.yaml
### model
model_name_or_path: /home/ckf/models/gemma-2-27b-it
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_rank: 8
lora_alpha: 16
lora_dropout: 0
deepspeed: examples/deepspeed/ds_z2_config.json
### dataset
dataset: causality_train2_cot
template: gemma
cutoff_len: 8192
max_samples: null
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: /home/lu/saves/gemma2-27b-it/lora/sft_causality_train2_cot
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
- 合并脚本和参数
llamafactory-cli export examples/merge_lora/gemma2_lora_sft.yaml
### model
model_name_or_path: /home/ckf/models/gemma-2-27b-it/
adapter_name_or_path: /home/lu/saves/gemma2-27b-it/lora/sft_causality_train2_cot/checkpoint-2000/
template: gemma
finetuning_type: lora
### export
export_dir: /home/lu/models/gemma2_lora_sft_causality_train2_cot
export_size: 5
export_device: auto
export_legacy_format: false
- 测试代码
llamafactory-cli chat examples/inference/gemma2_lora_sft_causality.yaml
合并前gemma2_lora_sft_causality.yaml内容:
model_name_or_path: /home/ckf/models/gemma-2-27b-it
adapter_name_or_path: /home/lu/saves/gemma2-27b-it/lora/sft_causality_train2_cot/checkpoint-2000/
template: gemma
finetuning_type: lora
合并后gemma2_lora_sft_causality.yaml内容:
model_name_or_path: /home/lu/models/gemma2_lora_sft_causality_train2_cot
template: gemma
Expected behavior
合并后应该与合并前输出同样的标准化Json数据
Others
No response
目前只在gemma-2-27-it上发现这个问题,其它模型例如qwen-2.5-32b-it没有这个问题
glm4也有问题
我们在分类任务上遇到了相似现象,大体表现为:训练完后加载各个ckpt单独评估的指标效果比训练过程中去eval 要明显地差。通过最小case,我们在do_eval = true, eval_steps: 30, save_steps: 30 max_steps: 32 复现了。如下图。最后定位是 peft.merge_and_unload() 的问题,注释掉再单独eval,指标效果跟训练时eval就对上了。
代码改动:
for adapter in adapter_to_merge:
model: "LoraModel" = PeftModel.from_pretrained(model, adapter, **init_kwargs)
# model = model.merge_and_unload() # 注释掉这一行
目前只在
gemma-2-27-it上发现这个问题,其它模型例如qwen-2.5-32b-it没有这个问题
我微调glm4-9b也有这个问题
qwen2.5vl-7b-instruct也会
qwen2.5vl-7b-instruct也会明显下降
请问解决了吗?这是模型的问题吗,在微调llada合并权重时也遇到了这个问题