Brian Hirsh
Brian Hirsh
More to generate discussion (maybe we want to land this? But the state of caching here feels pretty fragile). Partial fix to the issue here: https://fb.workplace.com/groups/1075192433118967/permalink/1381371379167736/ It looks like we...
Fixes https://github.com/pytorch/pytorch/issues/116433. Putting this out as a tentative fix, but more discussion is in the github issue. Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #116435
Partial fix for https://github.com/pytorch/pytorch/issues/120424. @int3 to continue investigation. Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #120427 cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy...
Fixes https://github.com/pytorch/pytorch/issues/125287 Fixes https://github.com/pytorch/pytorch/issues/124090, context on the issue Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #124400 * __->__ #124399 * #124398 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar...
Fixes https://github.com/pytorch/pytorch/issues/124397 Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #124400 * #124399 * #124398
This came in in the FSDP2 workstream, which needs a DTensor that holds some sort of float8 tensor (cc @Chillee @ezyang @zou3519 @albanD @samdow @msaroufim @anijain2305 @chauhang @awgu / @drisspg...
Fixes an error for torchtitan + internal Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #124400 * #124399 * __->__ #124398 cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv...
Re-land of https://github.com/pytorch/pytorch/pull/123347. The original PR broke internal because of a circular import due to importing dynamo in the DTensor code. The new version uses `torch._dynamo_disable` to work around Stack...
Example repro: ``` import torch def f(x): y = x + 1 z = torch.nn.Parameter(y) with torch.no_grad(): z.mul_(2) return y + z x = torch.ones(2, requires_grad=True) out_ref = f(x) out_test...
More details further down, but first a more high-level description of "how do we functionalize storage resizing" Today, dynamo converts `param.untyped_storage().resize_(x)` calls that it sees from fsdp into a custom...