torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

Feature Request : ORPO

Open nivibilla opened this issue 4 months ago • 3 comments

Hi,

First of all, thank you for this library! Very clean and I appreciate that all I need is pytorch!

I wanted to make an issue for the integration of ORPO, not needing to do SFT before the RLHF step is huge since it saves a lot of compute when training on preference data. Hoping it can be integrated into torch tune (with Lora support if possible)!

There is an existing integration into TRL

Thanks!

nivibilla avatar Apr 28 '24 19:04 nivibilla

@nivibilla thanks for opening this issue!

ORPO would indeed be a really nice addition to the library. It's not been at the top of our list if I was being honest, but maybe we should reconsider. Is this something you'd be open to adding? DPO and PPO (WIP) are both added by our awesome community members and if you'd be interested in adding ORPO - I'm happy to help to brainstorm and review the design, code etc.

kartikayk avatar Apr 29 '24 15:04 kartikayk

Hi @kartikayk

Im not that experienced in writing custom training loops. Mainly a huggingface user haha. I'd be no better than llama 3 70b attempting it 🤣

nivibilla avatar Apr 30 '24 21:04 nivibilla

I'm partly serious here, but why not train a codellama 70B using torchtune and then see if this gets you the right recipe :)

kartikayk avatar Apr 30 '24 23:04 kartikayk