SPPO
SPPO copied to clipboard
DPO baseline implementation
Dear authors, may I know how we can train the iterative DPO baseline model using this repo? Is there a convenient way to modify the sppo code?