peterbell10
peterbell10
Yeah I'll give that a shot as well.
> I think it will be great if we can provide a torch.compile mode that turns off all the optimizations known to cause large numerical differences. When users find torch.compile...
No significant swings on the benchmark results for disabling fp-contraction. Will have to dig into these torchbench accuracy failures though.
The issues seen with scaled softmax should have been resolved by #124119, so we're looking at the general case here; and it seems like the fma is generally improving precision....
@pytorchbot rebase
Would you prefer there to be a `torch.fma` and `aten.fma` which is a core ATen op?
> What I wonder is whether it makes more sense to have FMA on by default and then have a torch.nofma op for when it may be an issue. That...
One thing worth mentioning is that this manual `fma` insertion actually makes inductor closer to eager. Eager kernels are compiled generating `fma` instructions, but only for ops within a given...
Yes, because you use pass scheduler explicitly instead of via thread-local storage it should be equivalent.
It might help if you could grab some stack traces of where it hangs. However, If the bug really is in the compiler and/or system libraries then I'm not sure...