Mark
Mark
Really simple, just add argument that is already supported.
#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [X] fix a bug - [ ] update tests and/or...
#### Context What is the purpose of this PR? Is it to - [X] add a new feature - [ ] fix a bug - [ ] update tests and/or...
We're encountering an out-of-memory (OOM) error on a 2TB machine when attempting to load `Qwen3-235B-A22B`. The issue seems to be that `torchtune` loads the full, unsharded checkpoint with `self._checkpoint_client.load_base_checkpoint()`. For...
#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [X] fix a bug - [ ] update tests and/or...
#### Context What is the purpose of this PR? Is it to - [X] add a new feature - [ ] fix a bug - [X] update tests and/or documentation...
Reward modeling in torchtune RFC Core issues We do not have an out-of-the-box toolkit to perform state-of-the-art reward modeling in torchtune; While the standard reward models are usually trained by...
#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [x] fix a bug - [ ] update tests and/or...
RLHF procedures with modern DPO functionals may lead to the degenerate solution, e.g., the EOS token dropping during the generation. For instance, let's consider the output of the Qwen2.5 model...