llama-recipes
llama-recipes copied to clipboard

Published 20 hours ago •

Reame
Issues

DPO Fine-tuning

Open jens5588 opened this issue 9 months ago • 1 comments

🚀 The feature, motivation and pitch

Is it possible to adapt the fine-tuning script for DPO finetuning? The current version seems to only work for next token prediction fine-tuning.

Alternatives

No response

Additional context

No response

May 07 '24 15:05 jens5588

Thanks for the feedback! We are working on some examples and will let you know once they are integrated!

Aug 19 '24 18:08 init27