trl icon indicating copy to clipboard operation
trl copied to clipboard

KTOTrainer vs kto_loss in DPO-Trainer

Open liangxuZhang opened this issue 1 year ago • 4 comments

What's the difference between KTOTrainer and kto_loss in DPOTrainer. If I want to finetune with KTO, what should I use?

liangxuZhang avatar Mar 01 '24 10:03 liangxuZhang

cc @kashif

younesbelkada avatar Mar 04 '24 01:03 younesbelkada

I'm also curious about this. Can we just alternatively use kto_pair loss as there seems to be errors in KTOTrainer for now..?

hbin0701 avatar Mar 06 '24 00:03 hbin0701

@liangxuZhang you should use KTOTrainer. kto_loss is a simplified version of kto that doesn't work as well

kawine avatar Mar 09 '24 06:03 kawine

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Apr 02 '24 15:04 github-actions[bot]