Baichuan2 lora微调loss不下降

lora微调loss不下降

Open espectre opened this issue 2 years ago • 1 comments

全参微调loss可以下降到0.03，效果相对较好；但是lora微调loss在1.2-1.5波动，效果也不好。 deepspeed --hostfile=$hostfile fine-tune.py --report_to tensorboard --data_path "data/ysx_25588.json" --model_name_or_path "../baichuan-inc/Baichuan2-13B-Chat" --output_dir "output" --model_max_length 1024 --num_train_epochs 20 --per_device_train_batch_size 16 --gradient_accumulation_steps 1 --save_strategy epoch --learning_rate 2e-5 --lr_scheduler_type constant --adam_beta1 0.9 --adam_beta2 0.98 --adam_epsilon 1e-8 --max_grad_norm 1.0 --weight_decay 1e-4 --warmup_ratio 0.0 --logging_steps 1 --gradient_checkpointing True --deepspeed ds_config.json --bf16 True --tf32 True --use_lora True

Oct 16 '23 07:10 espectre

请问找到原因了吗？

Apr 11 '24 02:04 slliao445

Baichuan2 Baichuan2 copied to clipboard

lora微调loss不下降

Baichuan2
Baichuan2 copied to clipboard