LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

dpo训练后导出模型

Open hecongqing opened this issue 1 month ago • 0 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

#!/bin/bash

CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py
--stage dpo
--do_train
--model_name_or_path /mnt/data/legalexp/LLM_exp/MiniCPM/MiniCPM-2B-sft-bf16
--adapter_name_or_path /mnt/data/legalexp/LLM_exp/MiniCPM/minicpm_finetune_baseline_v2/output/CJOLoRA/checkpoint-5000
--create_new_adapter
--dataset cjo
--dataset_dir /mnt/data/legalexp/LLM_exp/MiniCPM/minicpm_finetune_baseline_v2/data/CJOChatML_DPO/
--template cpm
--finetuning_type lora
--lora_target q_proj,v_proj
--output_dir ../../saves/LLaMA2-7B/lora/dpo
--overwrite_cache
--overwrite_output_dir
--cutoff_len 2048
--preprocessing_num_workers 16
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--logging_steps 10
--warmup_steps 20
--save_steps 100
--eval_steps 100
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 1e-5
--num_train_epochs 1.0
--val_size 0.1
--dpo_ftx 1.0
--plot_loss
--max_samples 100
--fp16 #!/bin/bash

CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py
--stage dpo
--do_train
--model_name_or_path /mnt/data/legalexp/LLM_exp/MiniCPM/MiniCPM-2B-sft-bf16
--adapter_name_or_path /mnt/data/legalexp/LLM_exp/MiniCPM/minicpm_finetune_baseline_v2/output/CJOLoRA/checkpoint-5000
--create_new_adapter
--dataset cjo
--dataset_dir /mnt/data/legalexp/LLM_exp/MiniCPM/minicpm_finetune_baseline_v2/data/CJOChatML_DPO/
--template cpm
--finetuning_type lora
--lora_target q_proj,v_proj
--output_dir ../../saves/LLaMA2-7B/lora/dpo
--overwrite_cache
--overwrite_output_dir
--cutoff_len 2048
--preprocessing_num_workers 16
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--logging_steps 10
--warmup_steps 20
--save_steps 100
--eval_steps 100
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 1e-5
--num_train_epochs 1.0
--val_size 0.1
--dpo_ftx 1.0
--plot_loss
--max_samples 100
--fp16

Expected behavior

No response

System Info

No response

Others

minicpm 先经过sft微调后,再通过dpo微调,最后导出模型进行infer的时候

是合并最初minicpm 模型,还是 sft后的模型

#!/bin/bash

DO NOT use quantized model or quantization_bit when merging lora weights

CUDA_VISIBLE_DEVICES=0 python ../../src/export_model.py
--model_name_or_path /mnt/data/legalexp/LLM_exp/MiniCPM/MiniCPM-2B-sft-bf16
--adapter_name_or_path ../../saves/LLaMA2-7B/lora/dpo
--template default
--finetuning_type lora
--export_dir ../../saves/minicpm_dpo
--export_size 2
--export_legacy_format False

hecongqing avatar May 21 '24 13:05 hecongqing