Brian Hirsh
Brian Hirsh
[torch.compile] Mutating backward input of autograd function is not supported by ```torch.compile```
We started supporting mutations of module buffers in the backward (float8 needed it for delayed scaling), but we didn't extend the support to tangents. We should be able to support...
[torch.compile] Mutating backward input of autograd function is not supported by ```torch.compile```
PR'd https://github.com/pytorch/pytorch/pull/141131
@pytorchbot merge
@pytorchbot merge
Hmm @JackCaoG the grad accumulation impl is here: https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/functions/accumulate_grad.h#L116
hmm @JackCaoG , can you try out this patch? ``` diff --git a/aten/src/ATen/FunctionalStorageImpl.cpp b/aten/src/ATen/FunctionalStorageImpl.cpp index 3275c8f447f..d788aa29261 100644 --- a/aten/src/ATen/FunctionalStorageImpl.cpp +++ b/aten/src/ATen/FunctionalStorageImpl.cpp @@ -6,6 +6,13 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include...
fwiw @GLivshits - on a nightly, i get a different error when running that repro (this also seems like something worth looking into, I'm just surprised that it's different from...
@drisspg well I tried running with all 3 of these context managers and I get the same error (the code above is just using `nullcontext()`): ``` with nullcontext(): with torch.nn.attention.sdpa_kernel([torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION]):...
going to mark the (repro-able) error above as hi-pri since this appears to be a regression
one update: I noticed that when I take the repro from https://github.com/pytorch/pytorch/issues/133571#issuecomment-2298654162 and tweak it to force-turn-off static shapes (with `torch.compile(..., dynamic=False)`), the error goes away. This might be a...