torchtune RLHF Tracker

RLHF Tracker

Open SalmanMohammadi opened this issue 2 months ago • 0 comments

### Tasks
- [ ] https://github.com/pytorch/torchtune/issues/2082
- [ ] Full-finetune distributed DPO recipe #1966
- [ ] #1262
- [ ] PPO tutorial/deep dive
- [ ] DPO tutorial/deep dive
- [ ] Multimodal support for DPO
- [ ] Sample packing for preference datasets
- [ ] General support for classification models for PPO and reward modelling
- [ ] Reward modelling recipe
- [ ] E2E RLHF blogpost
- [ ] Full-finetune Distrbuted PPO Recipe

These are some RLHF-related features we'd like to see in torchtune. If you're interested in working on any of these, please open a separate issue for the task and recieve approval from a maintainer before opening a PR.

Nov 27 '24 14:11 SalmanMohammadi

torchtune torchtune copied to clipboard

RLHF Tracker

torchtune
torchtune copied to clipboard