Results 1 issues of Hao Chen

https://github.com/ContextualAI/HALOs 的工作提到KTO的效果优于DPO和PPO,且不需要paired dataset

enhancement
pending