salman issues

Results 41 issues of


                                            salman

Clarification on reward/value heads in PPOV2

First, thank you for your efforts in helping to bring accurate and performant RLHF techniques to the open-source community. I'm raising this issue hoping to get some clarification on a...

[RFC][DOCS] Recipe [DOCS] ([DOC]umentation)

What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [x] update tests and/or documentation - [...

CLA Signed

Setting `expandable_segments:True` in our recipes.

#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...

CLA Signed

Explain magic numbers in our tests

If we want to help set a high bar for our contributors (and ourselves), we should be clearer about where many of the magic numbers in our tests come from...

better engineering

Implement CPO (Contrastive Preference Optimization)

[CPO](https://arxiv.org/abs/2401.08417) seems like an interesting direct-preference-optimisation-style loss function which, similar to [SimPO](https://arxiv.org/abs/2405.14734), also eliminates the need for a reference model. There's also a reference implementation for the loss function in...

good first issue

community help wanted

Regression testing in torchtune

Regression tests in our repo test our recipes with full-size models, and run on a nightly basis. We currently only have a single regression test. This test finetunes Llama2-7B with...

better engineering

Organize our collation utils

see #1005 for some context. From @ebsmothers (and @joecummings)@ > I don't love our collate utilities rn. In an ideal world I want two simple abstractions: right_padded_collate and left_padded_collate, and...

rfc

discussion

Change self.compile in recipes

nit: `self._is_model_compiled` or similar is a bit more clear _Originally posted by @RdoubleA in https://github.com/pytorch/torchtune/pull/1223#discussion_r1710027730_

good first issue

community help wanted

Fix IPOLoss in DPO recipe - IPOLoss uses average logprobs instead of summed

See [this issue](https://github.com/huggingface/trl/issues/1677) and https://huggingface.co/blog/pref-tuning The DPO recipe should pass the average, not summed logprobs into the IPOLoss - similar to SimPO.

good first issue

community help wanted

Reworking tune ls

Currently, `tune ls` is a bit unweildy. Can we make it better? @joecummings

rfc

discussion