torchtune Feature Request: Full Support for Direct Preference Optimization (DPO)

Hello, I'm interested in knowing if there are any plans to implement Full support for Direct Preference Optimization (DPO) in the upcoming releases.

Are there any current efforts or roadmap items related to this, or is it something that might be considered in future updates?

Thank you for your time and consideration.

Sep 10 '24 07:09 pipSu

Hey @pipSu! We currently have a recipe for LoRA-based DPO (https://github.com/pytorch/torchtune/blob/main/recipes/lora_dpo_single_device.py). By "full" do you mean non-LoRA based full-finetuning?

Sep 10 '24 09:09 salmanmohammadi

Hey @pipSu! We currently have a recipe for LoRA-based DPO (https://github.com/pytorch/torchtune/blob/main/recipes/lora_dpo_single_device.py). By "full" do you mean non-LoRA based full-finetuning?

Thanks for reply. Yes I am inquiring about the support for DPO with full-finetuning, not based on LoRA.

Sep 10 '24 13:09 pipSu

Currently this isn't on our radar so I can't promise anything about when you might be able to see it, but it's helpful for us to see there's interest in this for the future. For now, if you'd interested at all in contributing a full finetuning recipe we'd be more than happy to support you!

Sep 10 '24 13:09 salmanmohammadi

We now support this.

Apr 22 '25 21:04 joecummings