LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

适用LLaMA-Factory微调qwen-7b-chat,所有参数与finetune.py一致,为什么效果差距很大呢?可以从哪些方面分析呢?

Open sunyclj opened this issue 3 months ago • 3 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

LLama-factory微调参数:deepspeed --num_gpus 2 src/train_bash.py --deepspeed ./examples/sft/ds_config_zero2_new.json --stage sft --do_train --model_name_or_path './qwen/Qwen-7B-Chat' --dataset luxun_alpace --finetuning_type lora --lora_target c_attn,c_proj,w1,w2 --output_dir qwen1.0_7b_chat --overwrite_cache --num_train_epochs 60 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --save_strategy epoch --lr_scheduler_type cosine --save_steps 1000 --learning_rate 3e-4 --logging_strategy epoch --cutoff_len 1024 --weight_decay 0.1 --adam_beta2 0.95 --warmup_ratio 0.01 --plot_loss --fp16 --lora_rank 1 --lora_alpha 2 --lora_dropout 0.05 --template default finetune.py微调参数: torchrun $DISTRIBUTED_ARGS finetune.py
--model_name_or_path $MODEL
--data_path $DATA
--bf16 True
--output_dir lora_finetune_ds/Qwen-7B-chat-lora
--num_train_epochs 60
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "epoch"
--save_steps 1000
--learning_rate 3e-4
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_strategy "epoch"
--logging_steps 1
--report_to "none"
--model_max_length 1024
--lazy_preprocess True
--use_lora
--gradient_checkpointing
--deepspeed finetune/ds_config_zero2.json

Expected behavior

lora的相关参数也是一致的,看微调训练日志,LLama-Factory的微调日志中grad_norm的值一直在变,参数中没有设置--max_grad_norm,默认的就是1,为什么grad_norm值发生变化?与最终效果变差有较大的关系嘛?

System Info

No response

Others

No response

sunyclj avatar Mar 18 '24 03:03 sunyclj