Chien-Chin Huang

Results 28 issues of Chien-Chin Huang

Summary: Several users have been asking this feature: https://github.com/pytorch/torchtitan/issues/1177 TODO: Remove fp8 subclass tensor TODO: Support HF format Test Plan: ``` CONFIG_FILE="torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --training.compile --parallelism.tensor_parallel_degree 4 --parallelism.enable_async_tensor_parallel --checkpoint.model_weights_only --checkpoint.unshard_weights --checkpoint.export_dtype="bfloat16"...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom): * __->__ #166991 While `_templated_ring_attention` is a private API, it is unfortunatelly used by some packages. Add it to __all__ so that people can...

oncall: distributed
ciflow/inductor
module: context parallel
release notes: context parallel

**Summary** This PR utilizes the latest APIs provided by DeviceMesh to simplify the creation of all different meshes. The design philosophy is as follow: 1. Create one world mesh with...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom): * __->__ #2049 * #2029 This PR provides a skelet This PR introduces an initial prototype and skeleton for fully DTensor-based training. The current...

CLA Signed

The recent pull request [[#2012](https://github.com/pytorch/torchtitan/pull/2012/)](https://github.com/pytorch/torchtitan/pull/2012/) introduces a dry run mode for TorchTitan. However, the current implementation restricts the dry run functionality to the configuration system only. This limitation means that...

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom): * __->__ #1857 * #1939 This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage...

CLA Signed

If we don't wait for the first quorum, the trainer will continue to run forward and may use incorrect weights if the trainer is healing.

CLA Signed