Brian Hirsh
Brian Hirsh
When functionalization is turned on in AOT Autograd, we want to hide input mutations in the graph so that the backend compiler doesn't need to worry about seeing `copy_()` ops...
One reason that functionalization can't be written as a pure graph transform is that its output can depend on the input metadata - specifically whether or not the program inputs...
This seems like a minor issue, but the following codes breaks: ``` def foo(x): z = torch.zeros(1) # factory func allocating on cpu z.copy_(x) # cuda_tensor.copy_(cpu_tensor) return z.sum() x =...
better description coming soon (but this is meant to fix https://github.com/pytorch/pytorch/issues/91093) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #92857 * __->__ #92924 * #92588
tldr; this should fix some minor perf regressions that were caused by adding more as_strided() calls in aot autograd. This PR adds a new context manager, `torch.autograd._set_view_replay_enabled()`. Context: AOT Autograd...
still waiting for CI fallout fixes #90759 Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #92857 * #92924 * #92588 cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen...
I spent some time trying to see what it would take to torch.compile() a module that used tensor subclasses, with the torchquant repo as my test example. I have a...
Fixes https://github.com/pytorch/pytorch/issues/119755. We are prototyping if this situation is generally going to be excercised more heavily when tracing FSDP. This slightly relaxes the assertion for the case in AOTAutograd when...
Pre-emptive test in OSS to ensure that models relying on the "non-overlapping guards" checks do not suffer drastically w.r.t. guard slowness. Current plan is to follow up on this with...