LLaMA-Factory DPO训练Lora后，模型的生成结果是乱码

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

您好，我通过SFT训练了一个能够正常使用的lora模型。现在想进一步通过DPO阶段的训练来优化lora模型的效果。但是我通过以下脚本训练后，输出的结果是乱码（随机重复的数字或字符串）。数据集我反复检查了是没有问题的。请问我是哪里出错了呢？

另外，我的目的是继续训练Lora，训练的输出希望是优化后的lora模型。这个参数adapter_name_or_path 我看介绍说的是path to sft checkpoint. 那这里我应该放的是lora模型，还是将lora和base合并后的模型呢？

非常感谢！

CUDA_VISIBLE_DEVICES=0 deepspeed --num_gpus=1 /root/LLaMA-Factory/src/train_bash.py
--model_name_or_path /root/model/base \ # 这个base模型是 TinyLlama/TinyLlama-1.1B-Chat-v0.6 --adapter_name_or_path /root/model/lora \ # 这个lora模型是之前通过SFT训练，能够正常使用。 --dataset_dir /root/dataset/ \ --output_dir /root/output/storyteller_1.3b
--dataset dpo_data
--flash_attn
--stage dpo
--do_train True
--finetuning_type lora
--quantization_bit 4
--neftune_noise_alpha 5
--template llama2
--cutoff_len 2048
--learning_rate 1e-4
--preprocessing_num_workers 8
--num_train_epochs 1.0
--max_samples 100000
--per_device_train_batch_size 1
--gradient_accumulation_steps 32
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 1
--save_steps 100
--warmup_steps 0
--lora_rank 128
--lora_alpha 256
--lora_dropout 0.05
--lora_target all
--bf16 True
--plot_loss True
--overwrite_output_dir True
--deepspeed ds_config_zero2.json

Expected behavior

No response

System Info

No response

Others

No response

Dec 29 '23 07:12 IvoryTower800

尝试先 merge lora 后再训练

Dec 29 '23 07:12 hiyouga

@hiyouga 好的，谢谢您的回复。如果我先merge lora后再训练，那我的model_name_or_path和adapter_name_or_path这两个参数，都是指向同一个合并后的模型吗？还是说我脚本中直接删除adapter_name_or_path只保留model_name_or_path？

非常感谢！

Dec 29 '23 10:12 IvoryTower800

只需要 model_name

Dec 29 '23 12:12 hiyouga

@hiyouga 谢谢您！我按您说的方式试了一下，DPO训练后模型输出正常了。非常感谢！

Dec 29 '23 14:12 IvoryTower800

这个是什么原因导致的，有过分析吗

Jan 23 '24 09:01 sxm7078

LLaMA-Factory LLaMA-Factory copied to clipboard

DPO训练Lora后，模型的生成结果是乱码

Reminder

Reproduction

Expected behavior

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard