LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

SFT后模型合并Lora权重,回答质量下降明显(非Issue #2505,#4913)

Open BGbigbear opened this issue 1 year ago • 7 comments

Reminder

  • [X] I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.35
  • Python version: 3.11.9
  • PyTorch version: 2.4.0 (GPU)
  • Transformers version: 4.43.4
  • Datasets version: 3.0.2
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A800 80GB PCIe
  • DeepSpeed version: 0.14.4
  • Bitsandbytes version: 0.43.1
  • vLLM version: 0.5.4

Reproduction

我遇到了和Issue#2505,#4913同样的问题,但在排查后排除了以上两个Issue中的原因。

合并前测试样例(使用SFT数据进行测试,输出标准化的Json数据): image

合并后测试样例(胡乱输出): image

以下提供微调、合并和测试的脚本以及参数:

  1. 微调脚本和参数
llamafactory-cli train examples/train_lora/gemma2_lora_sft_causality.yaml
### model
model_name_or_path: /home/ckf/models/gemma-2-27b-it

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_rank: 8
lora_alpha: 16
lora_dropout: 0
deepspeed: examples/deepspeed/ds_z2_config.json

### dataset
dataset: causality_train2_cot
template: gemma
cutoff_len: 8192
max_samples: null
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: /home/lu/saves/gemma2-27b-it/lora/sft_causality_train2_cot
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
  1. 合并脚本和参数
llamafactory-cli export examples/merge_lora/gemma2_lora_sft.yaml
### model
model_name_or_path: /home/ckf/models/gemma-2-27b-it/
adapter_name_or_path: /home/lu/saves/gemma2-27b-it/lora/sft_causality_train2_cot/checkpoint-2000/
template: gemma
finetuning_type: lora

### export
export_dir: /home/lu/models/gemma2_lora_sft_causality_train2_cot
export_size: 5
export_device: auto
export_legacy_format: false
  1. 测试代码
llamafactory-cli chat examples/inference/gemma2_lora_sft_causality.yaml

合并前gemma2_lora_sft_causality.yaml内容:

model_name_or_path: /home/ckf/models/gemma-2-27b-it
adapter_name_or_path: /home/lu/saves/gemma2-27b-it/lora/sft_causality_train2_cot/checkpoint-2000/
template: gemma
finetuning_type: lora

合并后gemma2_lora_sft_causality.yaml内容:

model_name_or_path: /home/lu/models/gemma2_lora_sft_causality_train2_cot
template: gemma

Expected behavior

合并后应该与合并前输出同样的标准化Json数据

Others

No response

BGbigbear avatar Nov 04 '24 09:11 BGbigbear

目前只在gemma-2-27-it上发现这个问题,其它模型例如qwen-2.5-32b-it没有这个问题

BGbigbear avatar Nov 05 '24 09:11 BGbigbear

glm4也有问题

zzk2021 avatar Feb 07 '25 06:02 zzk2021

我们在分类任务上遇到了相似现象,大体表现为:训练完后加载各个ckpt单独评估的指标效果比训练过程中去eval 要明显地差。通过最小case,我们在do_eval = true, eval_steps: 30, save_steps: 30 max_steps: 32 复现了。如下图。最后定位是 peft.merge_and_unload() 的问题,注释掉再单独eval,指标效果跟训练时eval就对上了。

Image

代码改动:


        for adapter in adapter_to_merge:
            model: "LoraModel" = PeftModel.from_pretrained(model, adapter, **init_kwargs)
            # model = model.merge_and_unload()    # 注释掉这一行

Aurelius84 avatar Mar 31 '25 06:03 Aurelius84

目前只在gemma-2-27-it上发现这个问题,其它模型例如qwen-2.5-32b-it没有这个问题

我微调glm4-9b也有这个问题

chenchen333-dev avatar Jun 10 '25 11:06 chenchen333-dev

qwen2.5vl-7b-instruct也会

lukasindeed avatar Aug 07 '25 07:08 lukasindeed

qwen2.5vl-7b-instruct也会明显下降

rosmarinocc avatar Aug 15 '25 07:08 rosmarinocc

请问解决了吗?这是模型的问题吗,在微调llada合并权重时也遇到了这个问题

deadlykitten4 avatar Oct 15 '25 05:10 deadlykitten4