llama-recipes
llama-recipes copied to clipboard
DPO Fine-tuning
🚀 The feature, motivation and pitch
Is it possible to adapt the fine-tuning script for DPO finetuning? The current version seems to only work for next token prediction fine-tuning.
Alternatives
No response
Additional context
No response
Thanks for the feedback! We are working on some examples and will let you know once they are integrated!