Matt Arsenault

Results 57 comments of Matt Arsenault

This is most likely the issue that https://github.com/llvm/llvm-project/pull/86012 should fix

The diagnostic is garbage (and absolutely should not print a stack trace, report_fatal_error's default is broken), but this does hit a proper error. The original assertion is gone

We're in the middle of overhauling the unsafe atomic handling (i.e. see https://github.com/llvm/llvm-project/pull/85052) I'm not sure the handling here was ever updated properly for gfx11. I'm planning on fixing all...

> I'll close it as specifying syncscope fixes the issue. Thanks! Note the attribute was just removed in edded8d7b5cb310524494cca317dd3582234b56f. You should now specify some combination of !amdgpu.no.fine.grained.memory , !amdgpu.no.remote.memory, and...

The global __HIP_FTZ/__CUDA_FTZ should imply the global FP mode setting. If those are enabled you should be able to set denormal-fp-math-f32=preserve-sign,preserve-sign on every function and the backend will handle this....

> > How is this supposed tested? I don't see any lit tests > > I don't see any lit tests for denorm handling on the NV side. Bad test...

Tricks like turning FP atomics into integer atomics would be best done in the backend (though FP atomicrmw does not have fast math flags, so that's a bit annoying). We...

* **#146306** 👈 (View in Graphite) * `main` This stack of pull requests is managed by Graphite. Learn more about stacking.

> I'm looking at the starter patch and I'm still not quite sure where this code needs to go. I believe it should go in the switch statement, specifically after...

You need to come up with some IR that has an operand that can be simplified. https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/copysign-simplify-demanded-bits.ll is one example. In this case it's more difficult because the obvious optimizations...