LLaMA-Factory 有计划支持 KTO 吗？

有计划支持 KTO 吗？

Open haochen2115 opened this issue 1 year ago • 9 comments

https://github.com/ContextualAI/HALOs 的工作提到KTO的效果优于DPO和PPO，且不需要paired dataset

Jan 04 '24 06:01 haochen2115

看起来make sense，现实中paired dataset获取成本较高。

Jan 05 '24 11:01 WhiteFu

dpo trainer的loss里貌似有kto，但不知道是否能成功训练

Jan 10 '24 07:01 chaunceyliu30

确实很心动，期待作者集成～

Jan 18 '24 07:01 Pattaro

同期待！

Mar 18 '24 06:03 JerryDaHeLian

@hiyouga 有计划安排吗

Apr 12 '24 08:04 zhufz

@hiyouga is there any plan for this?

May 03 '24 20:05 kriti-hippo

Very interested as well. Multiple research papers have confirmed at this point that KTO is superior to DPO in many ways.

May 04 '24 16:05 HideLord

Very interested as well. Multiple research papers have confirmed at this point that KTO is superior to DPO in many ways. Can you share a few of these papers, please? Thank you very much.

May 08 '24 03:05 benben1999

Yeah, here are two recent ones: Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Orca-Math: Unlocking the potential of SLMs in Grade School Math

May 08 '24 09:05 HideLord

fixed in https://github.com/hiyouga/LLaMA-Factory/pull/3785

May 18 '24 14:05 hiyouga

LLaMA-Factory LLaMA-Factory copied to clipboard

有计划支持 KTO 吗？

LLaMA-Factory
LLaMA-Factory copied to clipboard