DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Add DPO support for DeepSpeed-Chat

Open stceum opened this issue 2 years ago • 1 comments

Considering the advantages of DPO(Direct Preference Optimization) as being "stable, performant, and computationally lightweight, eliminating the need for fitting a reward model, sampling from the LM during fine-tuning, or performing significant hyperparameter tuning", we add DPO support for DeepSpeed-Chat.

stceum avatar Dec 08 '23 15:12 stceum

Accidentally closed the PR.. Sorry :(

stceum avatar Jan 27 '24 06:01 stceum