peterbell10

Results 134 comments of peterbell10

> I think it will be great if we can provide a torch.compile mode that turns off all the optimizations known to cause large numerical differences. When users find torch.compile...

No significant swings on the benchmark results for disabling fp-contraction. Will have to dig into these torchbench accuracy failures though.

The issues seen with scaled softmax should have been resolved by #124119, so we're looking at the general case here; and it seems like the fma is generally improving precision....

Would you prefer there to be a `torch.fma` and `aten.fma` which is a core ATen op?

> What I wonder is whether it makes more sense to have FMA on by default and then have a torch.nofma op for when it may be an issue. That...

One thing worth mentioning is that this manual `fma` insertion actually makes inductor closer to eager. Eager kernels are compiled generating `fma` instructions, but only for ops within a given...

Yes, because you use pass scheduler explicitly instead of via thread-local storage it should be equivalent.

It might help if you could grab some stack traces of where it hangs. However, If the bug really is in the compiler and/or system libraries then I'm not sure...