tianyu-l

Results 20 issues of tianyu-l

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #119877 Loss parallel is the last piece of sequence parallelism to enable. It enables efficient distributed cross entropy computation when the input...

oncall: distributed
ciflow/trunk
ciflow/inductor
release notes: distributed (dtensor)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #280 Per suggestion in #274: This PR removes embedding from number of parameters calculation, because embedding op doesn't do matmul. This PR...

CLA Signed

The issue comes from the backward computation of `aten.mul` of two complex numbers from DTensors: the result will be b + a`i` when it should be a + b`i`. Not...

bug
help wanted

FSDP + SP works fine when compile is off, but got the following error when compile is on: error log SP=2 ./run_llama_train.sh + TRAINER_DIR=/home/lty/local/torchtrain + MODEL=llama + MODEL_CONF=debugmodel + NGPU=8...

bug

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #180 * #285 * #161 * #172 This PR gets rid of the manual adjustment of num of heads in attention layers,...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #287 As titled. We can just use the `load_dataset` HF API to unify different use cases. 1. [`load_dataset`](https://huggingface.co/docs/datasets/v2.18.0/en/package_reference/loading_methods#datasets.load_dataset) is flexible in that,...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #296

CLA Signed

this could be easily done with lazy init

enhancement