LLaMA-Factory
LLaMA-Factory copied to clipboard
有计划支持 KTO 吗?
https://github.com/ContextualAI/HALOs 的工作提到KTO的效果优于DPO和PPO,且不需要paired dataset
看起来make sense,现实中paired dataset获取成本较高。
dpo trainer的loss里貌似有kto,但不知道是否能成功训练
确实很心动,期待作者集成~
同期待!
@hiyouga 有计划安排吗
@hiyouga is there any plan for this?
Very interested as well. Multiple research papers have confirmed at this point that KTO is superior to DPO in many ways.
Very interested as well. Multiple research papers have confirmed at this point that KTO is superior to DPO in many ways. Can you share a few of these papers, please? Thank you very much.
Yeah, here are two recent ones: Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Orca-Math: Unlocking the potential of SLMs in Grade School Math
fixed in https://github.com/hiyouga/LLaMA-Factory/pull/3785