llm-foundry Any plan for supporting DPO?

Any plan for supporting DPO?

Open lorabit110 opened this issue 1 year ago • 1 comments

🚀 Feature Request

Support DPO (Direct Preference Optimization) loss and data loader.

Motivation

Many recent open LLMs have achieved promising results from using DPO instead of RL-style tuning like PPO for alignment. And it seems to require less changes to llm-foundry than RLHF.

Jan 08 '24 19:01 lorabit110

same question here

May 09 '24 14:05 pretidav

llm-foundry llm-foundry copied to clipboard

Any plan for supporting DPO?

🚀 Feature Request

Motivation

llm-foundry
llm-foundry copied to clipboard