Mark issues

Results 30 issues of


                                            Mark

Bidirectional truncation in Llama4.

Really simple, just add argument that is already supported.

best practice

community help wanted

Fix GRPO recipe

#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [X] fix a bug - [ ] update tests and/or...

CLA Signed

[WIP] Proper tool calling support in the torchtune

#### Context What is the purpose of this PR? Is it to - [X] add a new feature - [ ] fix a bug - [ ] update tests and/or...

CLA Signed

Qwen3-235B-A22B OOMs with sufficient amount of VRAM

We're encountering an out-of-memory (OOM) error on a 2TB machine when attempting to load `Qwen3-235B-A22B`. The issue seems to be that `torchtune` loads the full, unsharded checkpoint with `self._checkpoint_client.load_base_checkpoint()`. For...

Reimplement `batched_rewards` to fix compatibility, broken in #2698

#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [X] fix a bug - [ ] update tests and/or...

CLA Signed

Reference-free DPO losses in torchtune.

#### Context What is the purpose of this PR? Is it to - [X] add a new feature - [ ] fix a bug - [X] update tests and/or documentation...

CLA Signed

[RFC] Reward modeling

Reward modeling in torchtune RFC Core issues We do not have an out-of-the-box toolkit to perform state-of-the-art reward modeling in torchtune; While the standard reward models are usually trained by...

CLA Signed

Fix command in config

#### Context What is the purpose of this PR? Is it to - [ ] add a new feature - [x] fix a bug - [ ] update tests and/or...

CLA Signed

Option to clip logprobs `rlhf.get_batch_log_probs`

RLHF procedures with modern DPO functionals may lead to the degenerate solution, e.g., the EOS token dropping during the generation. For instance, let's consider the output of the Qwen2.5 model...

enhancement