trl
trl copied to clipboard
KTOTrainer vs kto_loss in DPO-Trainer
What's the difference between KTOTrainer and kto_loss in DPOTrainer. If I want to finetune with KTO, what should I use?
cc @kashif
I'm also curious about this. Can we just alternatively use kto_pair loss as there seems to be errors in KTOTrainer for now..?
@liangxuZhang you should use KTOTrainer. kto_loss is a simplified version of kto that doesn't work as well
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.