MedicalGPT 请教DPO多轮对话的问题

请教DPO多轮对话的问题

Open chloefresh opened this issue 1 year ago • 3 comments

尝试把多轮对话数据格式做成下面的格式用DPO代码跑了一下lora，merge之后，发现推理速度变慢，而且推理会输出重复的内容。代码部分只把"prompt": ["Question: " + question + "\n\nAnswer: " for question in examples["question"]]改成了"prompt": examples["question"],是不是还需要和多轮对话sft一样每轮对话结束后加结束符？

{"question": "\n\nHuman:你好\n\nAssistant:你好\n\nHuman:你好\n\nAssistant:", "response_chosen": "您好", "response_rejected": "您好，有什么可以帮您的吗"}

使用的参数是： CUDA_VISIBLE_DEVICES=4,5,6 python dpo_training.py
--model_type baichuan
--model_name_or_path 经过sft的base模型
--train_file_dir ./reward
--validation_file_dir ./reward
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--do_train
--do_eval
--use_peft True
--max_train_samples -1
--max_eval_samples -1
--max_steps 100
--eval_steps 20
--save_steps 50
--max_source_length 1024
--max_target_length 256
--output_dir outputs-dpo-v1
--target_modules all
--lora_rank 8
--lora_alpha 16
--lora_dropout 0.05
--torch_dtype float16
--fp16 True
--device_map auto
--report_to tensorboard
--remove_unused_columns False
--gradient_checkpointing True
--cache_dir ./cache
--gradient_accumulation_steps 4

Dec 26 '23 07:12 chloefresh

可以手动加结束符。

Dec 26 '23 10:12 shibing624

@shibing624 dpo训练完了后推理速度变慢了不少，请问可能是什么原因呢？

Dec 28 '23 04:12 chloefresh

我没有感觉特别明显的区别。

Dec 28 '23 07:12 shibing624

MedicalGPT MedicalGPT copied to clipboard

请教DPO多轮对话的问题

MedicalGPT
MedicalGPT copied to clipboard