Results 213 issues of Vadim Kantorov

Given how much LLM training (via FSDP) and inference (often with vllm) are both needed for RL/GRPO, I wonder if it's time to upstream some basic components / utils for...

We just hit OOM, revealing that by default torchtune does not use torch.compile and that it does not use fused linear cross entropy yet... I found the following report from...

discussion

### Bug description Does torchtitan provide any recipes of how to implement batch skipping / OOM recovery in multi-node FSDP setup? In RL/GRPO training this is very pertinent (where we...

question
post training