Chien-Chin Huang issues

Results 28 issues of


                                            Chien-Chin Huang

[WIP] Implement the feature to save unsharded weights at the last step

Summary: Several users have been asking this feature: https://github.com/pytorch/torchtitan/issues/1177 TODO: Remove fp8 subclass tensor TODO: Support HF format Test Plan: ``` CONFIG_FILE="torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --training.compile --parallelism.tensor_parallel_degree 4 --parallelism.enable_async_tensor_parallel --checkpoint.model_weights_only --checkpoint.unshard_weights --checkpoint.export_dtype="bfloat16"...

CLA Signed

Async TP integration test

CLA Signed

[CP][BE][3/N] Add _templated_ring_attention to the backward compatility stub

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom): * __->__ #166991 While `_templated_ring_attention` is a private API, it is unfortunatelly used by some packages. Add it to __all__ so that people can...

oncall: distributed

ciflow/inductor

module: context parallel

release notes: context parallel

Use new DeviceMesh unflatten to rewrite parallel_dims

**Summary** This PR utilizes the latest APIs provided by DeviceMesh to simplify the creation of all different meshes. The design philosophy is as follow: 1. Create one world mesh with...

CLA Signed

[Full DTensor] Initial skeleton for full_dtensor mode

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom): * __->__ #2049 * #2029 This PR provides a skelet This PR introduces an initial prototype and skeleton for fully DTensor-based training. The current...

CLA Signed

Extend Dry Run Mode to Cover Trainer Initialization in TorchTitan

The recent pull request [[#2012](https://github.com/pytorch/torchtitan/pull/2012/)](https://github.com/pytorch/torchtitan/pull/2012/) introduces a dry run mode for TorchTitan. However, the current implementation restricts the dry run functionality to the configuration system only. This limitation means that...

[RFC][WIP][CP] Enable FlexAttention CP for llama3

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom): * __->__ #1857 * #1939 This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage...

CLA Signed

Disable async quorum for the first quorum sync

If we don't wait for the first quorum, the trainer will continue to run forward and may use incorrect weights if the trainer is healing.

CLA Signed