self-rewarding-lm-pytorch
self-rewarding-lm-pytorch copied to clipboard
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
Self-Rewarding Language Model (wip)
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
They really took the title of the DPO paper to heart.
May generalize the framework so one can add SPIN as well.
Appreciation
- A16Z Open Source AI Grant Program and 🤗 Huggingface for the generous sponsorships, as well as my other sponsors, for affording me the independence to open source current artificial intelligence research
Citation
@misc{yuan2024selfrewarding,
title = {Self-Rewarding Language Models},
author = {Weizhe Yuan and Richard Yuanzhe Pang and Kyunghyun Cho and Sainbayar Sukhbaatar and Jing Xu and Jason Weston},
year = {2024},
eprint = {2401.10020},
archivePrefix = {arXiv},
primaryClass = {cs.CL}
}
@article{Chen2024SelfPlayFC,
title = {Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models},
author = {Zixiang Chen and Yihe Deng and Huizhuo Yuan and Kaixuan Ji and Quanquan Gu},
journal = {ArXiv},
year = {2024},
volume = {abs/2401.01335},
url = {https://api.semanticscholar.org/CorpusID:266725672}
}
@article{Rafailov2023DirectPO,
title = {Direct Preference Optimization: Your Language Model is Secretly a Reward Model},
author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
journal = {ArXiv},
year = {2023},
volume = {abs/2305.18290},
url = {https://api.semanticscholar.org/CorpusID:258959321}
}