Natalia Gimelshein

Results 214 comments of Natalia Gimelshein

@d4l3k More memory than math is unexpected (and, tbh, torch.compile being faster than flash attn is unexpected too), is it possible to look at the full model?

@d4l3k can you please open an issue?

Does inductor use cudagraphs? Profiler + cudagraphs don't work together, it's an old and still unfixed issue, see e.g. https://github.com/pytorch/torchdynamo/issues/1413

We have an open (and high priority) issue here #75504, it has a workaround suggested that's not really workable in most cases and no other activity

Pointwise ops are pretty reliably optimized by inductor, and new fused variants won't be added to pytorch core.

High water mark of memory is achieved towards the end of the forward pass and is determined by the things one needs to save for backward. Making some intermediate operations...

Running full suites, but I think we can just wait for the dashboard to update

Alternatively, could we improve AsStridedBackward for the cases where we can easily see that all original elements participated, so no need to prefill with 0s before copying grad?

So this is all looking good, but I wonder what'll happen to optimizers once this is turned on. Optimizers are currently doing inplace updates, that inductor handles, and with this...