Andrew Gu

Results 32 issues of Andrew Gu

The recent commit https://github.com/microsoft/LoRA/commit/a0a92e0f26c067cf94747bdbf1ce73793fa44d19 flipped `A` and `B` in the comment for the LoRA `Linear` module: https://github.com/microsoft/LoRA/blob/a0a92e0f26c067cf94747bdbf1ce73793fa44d19/loralib/layers.py#L119-L125 The LoRA `Embedding` module similarly has the initialization flipped (not sure if this...

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #120238 * #120231 This PR is not for landing as is. It is only to prototype what handling gradient norm clipping via...

oncall: distributed
release notes: distributed (fsdp)
test-config/distributed
ciflow/inductor

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #182 If we shard the embeddings as a separate FSDP parameter group, then: - In forward, we have a separate all-gather for...

CLA Signed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #125394 FSDP only runs its pre/post-forward hooks on `nn.Module.forward`. This means that if the user runs a custom method meant as a...

oncall: distributed
release notes: distributed (fsdp2)
ci-td-distributed

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #125484 * #125479 * #125431 **Context** We are interested in supporting the case where HSDP reduce-scatters but does not all-reduce in a...

oncall: distributed
Merged
Reverted
ciflow/trunk
ciflow/periodic
release notes: distributed (fsdp2)

I am not familiar with builds, but it seems that I cannot install `torchdistx` for any PyTorch version past 1.13 (e.g. if I am developing on top of current `master`)....

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #126497 This fixes https://github.com/pytorch/pytorch/issues/126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP....

oncall: distributed
Merged
Reverted
ciflow/trunk
ciflow/inductor
release notes: distributed (fsdp2)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #126892 * __->__ #126887 simplify the test :) cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @penguinwu @fegin @XilunWu @wanchaol...

oncall: distributed
ciflow/trunk
topic: not user facing

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #126892 * #126887 This test shows that we could always set `reshard_after_forward=False` but manually insert calls to `module.reshard()` to implement the resharding...

oncall: distributed
ciflow/trunk
topic: not user facing

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #124764 This partially addresses https://github.com/pytorch/pytorch/issues/113794. To avoid being BC breaking, we just issue an warning when wrapping `ModuleList` or `ModuleDict`. We want...

oncall: distributed
ciflow/trunk
release notes: distributed (fsdp)
ciflow/inductor
ci-td-distributed