torchtune
torchtune copied to clipboard
RLHF Tracker
### Tasks
- [ ] https://github.com/pytorch/torchtune/issues/2082
- [ ] Full-finetune distributed DPO recipe #1966
- [ ] #1262
- [ ] PPO tutorial/deep dive
- [ ] DPO tutorial/deep dive
- [ ] Multimodal support for DPO
- [ ] Sample packing for preference datasets
- [ ] General support for classification models for PPO and reward modelling
- [ ] Reward modelling recipe
- [ ] E2E RLHF blogpost
- [ ] Full-finetune Distrbuted PPO Recipe
These are some RLHF-related features we'd like to see in torchtune. If you're interested in working on any of these, please open a separate issue for the task and recieve approval from a maintainer before opening a PR.