DPO baseline implementation

Open yesiam-png opened this issue 1 year ago • 0 comments

Dear authors, may I know how we can train the iterative DPO baseline model using this repo? Is there a convenient way to modify the sppo code?

Jul 23 '24 07:07 yesiam-png