Chien-Chin Huang issues

Results 28 issues of


                                            Chien-Chin Huang

[DSD] Fix to remove non_persistent buffer in distributed state dict

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #125339 * #125338 * __->__ #125337 * #125336 * #125335 * #125334 * #125333 Summary: Fixes #122792 state_dict includes only persistent buffers, while...

oncall: distributed

ciflow/trunk

ciflow/periodic

module: distributed_checkpoint

[PT2D] Ensure the trace rules are correct with distributed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #125339 * #125338 * #125337 * #125336 * #125335 * #125334 * __->__ #125333 Summary: 1. Avoid using `torch._dynamo.disable`. 2. Clear the LRU...

oncall: distributed

ciflow/trunk

release notes: distributed (c10d)

module: dynamo

ciflow/inductor

Fix the incorrect step log for profiler after resuming from a checkpoint

Summary: The profiler currently maintains a counter locally and that counter is not synchronized with the checkpointed train step. This PR fixes the issue.

CLA Signed

Implement async_checkpoint

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #302 Summary: This PR implements 2 different async checkpoint. The first one is to use DCP.async_save another one is to use pinned...

CLA Signed

Add support of DDP and CompiledAutograd.

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #319

CLA Signed

[RFC] Allow ModelWrapper and OptimizerWrapper to accept multiple models

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #360 and optimizers

CLA Signed

[RFC] Enable HSDP

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #518 This PR enables HSDP. **Discussions** **1. How does trainer get DP mesh?** Right now, we flatten `["dp_replicate", "dp_shard"]` into a flattened...

CLA Signed

Enable CP

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #433 This PR adds experimental flags and functions to enable context parallelism. We currently support on ly FSDP + CP and CP...

CLA Signed

[WIP]Implement llama4 HF format to DCP converter

**Why do we need this?** There have been a lot of asks to get the HF checkpoints work with TorchTitan. There are already workarounds for this problem. However, the converted...

CLA Signed

Add TorchFT integration test

CLA Signed