LLaMA-Factory reward模型训练loss为0

reward模型训练loss为0

Open neverstoplearn opened this issue 1 year ago • 5 comments

2023-06-16 19-10-21屏幕截图可能是什么问题？

Jun 16 '23 11:06 neverstoplearn

loss 溢出了，尝试降低学习率到 1e-5.

Jun 16 '23 11:06 hiyouga

loss 溢出了，尝试降低学习率到 1e-5.

学习率降低到1e-6 5e-7都会有这种情况

Jun 16 '23 13:06 neverstoplearn

可能是硬件不 match。是使用了量化模型吗？

Jun 16 '23 13:06 hiyouga

可能是硬件不 match。是使用了量化模型吗？

没有使用量话模型训练参数是 CUDA_VISIBLE_DEVICES=0 python src/train_rm.py --model_name_or_path ./bloomz-560m/ --do_train --dataset comparison_gpt4_zh --finetuning_type lora --checkpoint_dir path_to_pt_checkpoint --output_dir path_to_rm_checkpoint --per_device_train_batch_size 4 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --logging_steps 10 --save_steps 1000 --learning_rate 5e-7 --num_train_epochs 1.0 --lora_target query_key_value --plot_loss --fp16

Jun 16 '23 15:06 neverstoplearn

我也遇到了相同的问题，模型是bloom-560m，硬件是一张24G 3090.

Jun 24 '23 11:06 PPnorain

LLaMA-Factory LLaMA-Factory copied to clipboard

reward模型训练loss为0

LLaMA-Factory
LLaMA-Factory copied to clipboard