trlx icon indicating copy to clipboard operation
trlx copied to clipboard

Direct Policy Optimization

Open Reichenbachian opened this issue 1 year ago • 4 comments

🚀 The feature, motivation, and pitch

Hey all! Appreciate the work.

Is there any word on whether DPO (direct policy optimization) will be integrated into the trlx library soon?

Alternatives

No response

Additional context

No response

Reichenbachian avatar Jun 12 '23 12:06 Reichenbachian

@Reichenbachian I think there is currently a version of DPO under review on the TRL lib if you want to check : https://github.com/lvwerra/trl/pull/416/files#diff-5bbdb5d54108f2162b47bc54dc23c7b8e7744d2941118e60a44c161a4acc0ee8

Forbu avatar Jun 14 '23 15:06 Forbu

wonder if there is any updates regarding implementing dpo features in trlx, many thanks!

CSerxy avatar Jul 25 '23 23:07 CSerxy

There hasn't been any updates regarding that. AFAIK nobody is currently working on it, so you can freely pick it up if you want!

maxreciprocate avatar Jul 26 '23 03:07 maxreciprocate

Hi, is this something that is still open to work on? I would like to pick it up if that is okay :)

@CSerxy I've just forked and begun work on this feature, let me know if this conflicts with you

sandeepchittilla avatar Jul 26 '23 09:07 sandeepchittilla