Feature Request: Full Support for Direct Preference Optimization (DPO)
Hello, I'm interested in knowing if there are any plans to implement Full support for Direct Preference Optimization (DPO) in the upcoming releases.
Are there any current efforts or roadmap items related to this, or is it something that might be considered in future updates?
Thank you for your time and consideration.
Hey @pipSu! We currently have a recipe for LoRA-based DPO (https://github.com/pytorch/torchtune/blob/main/recipes/lora_dpo_single_device.py). By "full" do you mean non-LoRA based full-finetuning?
Hey @pipSu! We currently have a recipe for LoRA-based DPO (https://github.com/pytorch/torchtune/blob/main/recipes/lora_dpo_single_device.py). By "full" do you mean non-LoRA based full-finetuning?
Thanks for reply. Yes I am inquiring about the support for DPO with full-finetuning, not based on LoRA.
Currently this isn't on our radar so I can't promise anything about when you might be able to see it, but it's helpful for us to see there's interest in this for the future. For now, if you'd interested at all in contributing a full finetuning recipe we'd be more than happy to support you!
We now support this.