torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

Feature Request: Full Support for Direct Preference Optimization (DPO)

Open pipSu opened this issue 1 year ago • 3 comments

Hello, I'm interested in knowing if there are any plans to implement Full support for Direct Preference Optimization (DPO) in the upcoming releases.

Are there any current efforts or roadmap items related to this, or is it something that might be considered in future updates?

Thank you for your time and consideration.

pipSu avatar Sep 10 '24 07:09 pipSu

Hey @pipSu! We currently have a recipe for LoRA-based DPO (https://github.com/pytorch/torchtune/blob/main/recipes/lora_dpo_single_device.py). By "full" do you mean non-LoRA based full-finetuning?

salmanmohammadi avatar Sep 10 '24 09:09 salmanmohammadi

Hey @pipSu! We currently have a recipe for LoRA-based DPO (https://github.com/pytorch/torchtune/blob/main/recipes/lora_dpo_single_device.py). By "full" do you mean non-LoRA based full-finetuning?

Thanks for reply. Yes I am inquiring about the support for DPO with full-finetuning, not based on LoRA.

pipSu avatar Sep 10 '24 13:09 pipSu

Currently this isn't on our radar so I can't promise anything about when you might be able to see it, but it's helpful for us to see there's interest in this for the future. For now, if you'd interested at all in contributing a full finetuning recipe we'd be more than happy to support you!

salmanmohammadi avatar Sep 10 '24 13:09 salmanmohammadi

We now support this.

joecummings avatar Apr 22 '25 21:04 joecummings