salman
Results
41
issues of
salman
- Activation offloading (see implementation [here](https://github.com/pytorch/torchtune/blob/main/torchtune/training/_activation_offloading.py)) - Fusing optimizer step into backward pass (see implementation [here](https://github.com/pytorch/torchtune/blob/main/torchtune/training/memory.py#L219)) - Utilize `full_shard` `reshard_after_forward` (see [here](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md#fsdp1--fsdp2-api-differences)). I wasn't 100% sure if I could see...
enhancement