add kto
What does this PR do?
add kto trainer https://arxiv.org/abs/2402.01306
Before submitting
- [x] Did you read the contributor guideline?
test sh: model_name_or_path=/root/qwen_/Qwen1___5-0___5B-Chat/ output_dir=/root/LLMmodels/Qwen1___5-0___5B-Chat-kto-test/
deepspeed --include localhost:3,4,5 --master_port=9909 src/train.py
--deepspeed /root/llama-efficient-tuning/ds_config_kpo.json
--stage kto
--kto_ftx 0.1
--model_name_or_path ${model_name_or_path}
--do_train
--dataset kto-mix-test
--template qwen
--finetuning_type full
--output_dir ${output_dir}
--overwrite_cache
--overwrite_output_dir
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 5
--save_steps 20000
--learning_rate 1e-6
--num_train_epochs 2
--plot_loss
--bf16
--neftune_noise_alpha 5
--cutoff_len 32000
--logging_dir /root/log/
LGTM! Thanks for adding the KTO algorithm!