salman
salman
First, thank you for your efforts in helping to bring accurate and performant RLHF techniques to the open-source community. I'm raising this issue hoping to get some clarification on a...
What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [x] update tests and/or documentation - [...
#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...
If we want to help set a high bar for our contributors (and ourselves), we should be clearer about where many of the magic numbers in our tests come from...
[CPO](https://arxiv.org/abs/2401.08417) seems like an interesting direct-preference-optimisation-style loss function which, similar to [SimPO](https://arxiv.org/abs/2405.14734), also eliminates the need for a reference model. There's also a reference implementation for the loss function in...
Regression tests in our repo test our recipes with full-size models, and run on a nightly basis. We currently only have a single regression test. This test finetunes Llama2-7B with...
see #1005 for some context. From @ebsmothers (and @joecummings)@ > I don't love our collate utilities rn. In an ideal world I want two simple abstractions: right_padded_collate and left_padded_collate, and...
nit: `self._is_model_compiled` or similar is a bit more clear _Originally posted by @RdoubleA in https://github.com/pytorch/torchtune/pull/1223#discussion_r1710027730_
See [this issue](https://github.com/huggingface/trl/issues/1677) and https://huggingface.co/blog/pref-tuning The DPO recipe should pass the average, not summed logprobs into the IPOLoss - similar to SimPO.
Currently, `tune ls` is a bit unweildy. Can we make it better? @joecummings