peterbell10 comments

Results 134 comments of


                                            peterbell10

[inductor] Disable fp contraction and add option to use precise division

Yeah I'll give that a shot as well.

[inductor] Disable fp contraction and add option to use precise division

> I think it will be great if we can provide a torch.compile mode that turns off all the optimizations known to cause large numerical differences. When users find torch.compile...

[inductor] Disable fp contraction and add option to use precise division

No significant swings on the benchmark results for disabling fp-contraction. Will have to dig into these torchbench accuracy failures though.

[inductor] Disable fp contraction and add option to use precise division

The issues seen with scaled softmax should have been resolved by #124119, so we're looking at the general case here; and it seems like the fma is generally improving precision....

WIP: [inductor] Be more strict about down-casting to {b,}float16

@pytorchbot rebase

[inductor] Add explicit fmas in variance formulas

Would you prefer there to be a `torch.fma` and `aten.fma` which is a core ATen op?

[inductor] Add explicit fmas in variance formulas

> What I wonder is whether it makes more sense to have FMA on by default and then have a torch.nofma op for when it may be an issue. That...

[inductor] Add explicit fmas in variance formulas

One thing worth mentioning is that this manual `fma` insertion actually makes inductor closer to eager. Eager kernels are compiled generating `fma` instructions, but only for ops within a given...

Avoid using thread_local to work-around mingw bug

Yes, because you use pass scheduler explicitly instead of via thread-local storage it should be equivalent.

Is there a pthreads pool available that works on Windows?

It might help if you could grab some stack traces of where it hangs. However, If the bug really is in the compiler and/or system libraries then I'm not sure...