Ke Wen

Results 36 issues of Ke Wen

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #180 Status: - Switched to DTensor based TP in regular tensor path - Result is correct, but there is a perf gap...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #379 * #362 * #381 Separate the addition of 2D test from original PR #362 for easier review and landing. Also changed...

CLA Signed

# What does this PR do? Non-persistent buffers is not saved in state dict. In the case of meta init, while loading state dict from checkpoint can fill in parameters...

Added files: - model_dist.py a mirror of model.py with Tensor Parallelism baked in. - dist_run.py toy example of how to run the model in distributed way. Test: ``` torchrun --nproc-per-node...

CLA Signed

When composing distributed with quantization, one potential case is that the model has been quantized and saved so a second run do not need to quantize it again. This is...

CLA Signed

### 🚀 The feature, motivation and pitch This is for aligning distributed's load behavior with single-device's case. Today distributed relies on an index file containing a `param->bin` mapping to limit...

### 🐛 Describe the bug ``` torchrun --nproc-per-node 8 dist_run.py ``` ``` known configs: ['13B', '30B', '34B', '70B', '7B', 'CodeLlama-7b-Python-hf', 'Mistral-7B', 'stories110M', 'stories15M', 'stories42M', 'Meta-Llama-3-70B', 'Meta-Llama-3-8B', 'Meta-Llama-3.1-70B-Tune', 'Meta-Llama-3.1-70B', 'Meta-Llama-3.1-8B-Tune', 'Meta-Llama-3.1-8B']...

Distributed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #137763 * __->__ #135273 * #137161 * #138178 This PR contains multiple fixes for issue https://github.com/pytorch/pytorch/issues/135279: ## First part: Moves the GPU guard...

oncall: distributed
ciflow/trunk
release notes: distributed (c10d)
topic: bug fixes
ciflow/periodic
ciflow/inductor
keep-going

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #137763 * #135273 * __->__ #137161 * #138178 cc @XilunWu @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

oncall: distributed
ciflow/trunk
topic: not user facing
ciflow/periodic
ciflow/inductor
keep-going
ciflow/rocm
ci-no-td

This test was disabled because it is failing on main branch ([recent examples](https://torch-ci.com/failure?failureCaptures=%5B%22distributed%2Ftest_c10d_nccl.py%3A%3ANcclErrorHandlingTest%3A%3Atest_get_future_result%22%5D)). cc @XilunWu @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

oncall: distributed
skipped