DeepSpeedExamples Add DPO support for DeepSpeed-Chat

Add DPO support for DeepSpeed-Chat

Open stceum opened this issue 2 years ago • 1 comments

Considering the advantages of DPO(Direct Preference Optimization) as being "stable, performant, and computationally lightweight, eliminating the need for fitting a reward model, sampling from the LM during fine-tuning, or performing significant hyperparameter tuning", we add DPO support for DeepSpeed-Chat.

Dec 08 '23 15:12 stceum

Accidentally closed the PR.. Sorry :(

Jan 27 '24 06:01 stceum

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Add DPO support for DeepSpeed-Chat

DeepSpeedExamples
DeepSpeedExamples copied to clipboard