llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Any plan for supporting DPO?

Open lorabit110 opened this issue 1 year ago • 1 comments

🚀 Feature Request

Support DPO (Direct Preference Optimization) loss and data loader.

Motivation

Many recent open LLMs have achieved promising results from using DPO instead of RL-style tuning like PPO for alignment. And it seems to require less changes to llm-foundry than RLHF.

lorabit110 avatar Jan 08 '24 19:01 lorabit110

same question here

pretidav avatar May 09 '24 14:05 pretidav