Brian Hirsh

Results 151 comments of Brian Hirsh

We started supporting mutations of module buffers in the backward (float8 needed it for delayed scaling), but we didn't extend the support to tangents. We should be able to support...

Hmm @JackCaoG the grad accumulation impl is here: https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/functions/accumulate_grad.h#L116

hmm @JackCaoG , can you try out this patch? ``` diff --git a/aten/src/ATen/FunctionalStorageImpl.cpp b/aten/src/ATen/FunctionalStorageImpl.cpp index 3275c8f447f..d788aa29261 100644 --- a/aten/src/ATen/FunctionalStorageImpl.cpp +++ b/aten/src/ATen/FunctionalStorageImpl.cpp @@ -6,6 +6,13 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include...

fwiw @GLivshits - on a nightly, i get a different error when running that repro (this also seems like something worth looking into, I'm just surprised that it's different from...

@drisspg well I tried running with all 3 of these context managers and I get the same error (the code above is just using `nullcontext()`): ``` with nullcontext(): with torch.nn.attention.sdpa_kernel([torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION]):...

going to mark the (repro-able) error above as hi-pri since this appears to be a regression

one update: I noticed that when I take the repro from https://github.com/pytorch/pytorch/issues/133571#issuecomment-2298654162 and tweak it to force-turn-off static shapes (with `torch.compile(..., dynamic=False)`), the error goes away. This might be a...