Mark

Results 30 issues of Mark

Really simple, just add argument that is already supported.

best practice
community help wanted

#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [X] fix a bug - [ ] update tests and/or...

CLA Signed

RL CI is working improperly.

CLA Signed

#### Context What is the purpose of this PR? Is it to - [X] add a new feature - [ ] fix a bug - [ ] update tests and/or...

CLA Signed

We're encountering an out-of-memory (OOM) error on a 2TB machine when attempting to load `Qwen3-235B-A22B`. The issue seems to be that `torchtune` loads the full, unsharded checkpoint with `self._checkpoint_client.load_base_checkpoint()`. For...

#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [X] fix a bug - [ ] update tests and/or...

CLA Signed

#### Context What is the purpose of this PR? Is it to - [X] add a new feature - [ ] fix a bug - [X] update tests and/or documentation...

CLA Signed

Reward modeling in torchtune RFC Core issues We do not have an out-of-the-box toolkit to perform state-of-the-art reward modeling in torchtune; While the standard reward models are usually trained by...

CLA Signed

#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [x] fix a bug - [ ] update tests and/or...

CLA Signed

RLHF procedures with modern DPO functionals may lead to the degenerate solution, e.g., the EOS token dropping during the generation. For instance, let's consider the output of the Qwen2.5 model...

enhancement