LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

有计划支持 KTO 吗?

Open haochen2115 opened this issue 1 year ago • 9 comments

https://github.com/ContextualAI/HALOs 的工作提到KTO的效果优于DPO和PPO,且不需要paired dataset

haochen2115 avatar Jan 04 '24 06:01 haochen2115

看起来make sense,现实中paired dataset获取成本较高。

WhiteFu avatar Jan 05 '24 11:01 WhiteFu

dpo trainer的loss里貌似有kto,但不知道是否能成功训练

chaunceyliu30 avatar Jan 10 '24 07:01 chaunceyliu30

确实很心动,期待作者集成~

Pattaro avatar Jan 18 '24 07:01 Pattaro

同期待!

JerryDaHeLian avatar Mar 18 '24 06:03 JerryDaHeLian

@hiyouga 有计划安排吗

zhufz avatar Apr 12 '24 08:04 zhufz

@hiyouga is there any plan for this?

kriti-hippo avatar May 03 '24 20:05 kriti-hippo

Very interested as well. Multiple research papers have confirmed at this point that KTO is superior to DPO in many ways.

HideLord avatar May 04 '24 16:05 HideLord

Very interested as well. Multiple research papers have confirmed at this point that KTO is superior to DPO in many ways. Can you share a few of these papers, please? Thank you very much.

benben1999 avatar May 08 '24 03:05 benben1999

fixed in https://github.com/hiyouga/LLaMA-Factory/pull/3785

hiyouga avatar May 18 '24 14:05 hiyouga