LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

add kto

Open enji-zhou opened this issue 1 year ago • 1 comments

What does this PR do?

add kto trainer https://arxiv.org/abs/2402.01306

Before submitting

enji-zhou avatar May 17 '24 05:05 enji-zhou

test sh: model_name_or_path=/root/qwen_/Qwen1___5-0___5B-Chat/ output_dir=/root/LLMmodels/Qwen1___5-0___5B-Chat-kto-test/

deepspeed --include localhost:3,4,5 --master_port=9909 src/train.py
--deepspeed /root/llama-efficient-tuning/ds_config_kpo.json
--stage kto
--kto_ftx 0.1
--model_name_or_path ${model_name_or_path}
--do_train
--dataset kto-mix-test
--template qwen
--finetuning_type full
--output_dir ${output_dir}
--overwrite_cache
--overwrite_output_dir
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 5
--save_steps 20000
--learning_rate 1e-6
--num_train_epochs 2
--plot_loss
--bf16
--neftune_noise_alpha 5
--cutoff_len 32000
--logging_dir /root/log/

enji-zhou avatar May 17 '24 05:05 enji-zhou

LGTM! Thanks for adding the KTO algorithm!

hiyouga avatar May 17 '24 19:05 hiyouga