Matt Arsenault comments

Results 57 comments of


                                            Matt Arsenault

[AMDGPU] Wrong O0 codegen for workgroup id x + control flow

This is most likely the issue that https://github.com/llvm/llvm-project/pull/86012 should fix

llc crashes when given invalid command line arguments

The diagnostic is garbage (and absolutely should not print a stack trace, report_fatal_error's default is broken), but this does hit a proper error. The original assertion is gone

[Issue]: enabling `amdgpu-unsafe-fp-atomics` for gfx90a

We're in the middle of overhauling the unsafe atomic handling (i.e. see https://github.com/llvm/llvm-project/pull/85052) I'm not sure the handling here was ever updated properly for gfx11. I'm planning on fixing all...

[Issue]: enabling `amdgpu-unsafe-fp-atomics` for gfx90a

> I'll close it as specifying syncscope fixes the issue. Thanks! Note the attribute was just removed in edded8d7b5cb310524494cca317dd3582234b56f. You should now specify some combination of !amdgpu.no.fine.grained.memory , !amdgpu.no.remote.memory, and...

[AMD] Handle denorms properly for exp2 and exp

The global __HIP_FTZ/__CUDA_FTZ should imply the global FP mode setting. If those are enabled you should be able to set denormal-fp-math-f32=preserve-sign,preserve-sign on every function and the backend will handle this....

[AMD] Handle denorms properly for exp2 and exp

> > How is this supposed tested? I don't see any lit tests > > I don't see any lit tests for denorm handling on the NV side. Bad test...

Implement float/double atomicMin/Max in terms of integer atomics

Tricks like turning FP atomics into integer atomics would be best done in the backend (though FP atomicrmw does not have fast math flags, so that's a bit annoying). We...

X86: Avoid some uses of getPointerTy

* **#146306** 👈 (View in Graphite) * `main` This stack of pull requests is managed by Graphite. Learn more about stacking.

SimplifyDemandedBits should handle fneg/fabs/fcopysign

> I'm looking at the starter patch and I'm still not quite sure where this code needs to go. I believe it should go in the switch statement, specifically after...

SimplifyDemandedBits should handle fneg/fabs/fcopysign

You need to come up with some IR that has an operand that can be simplified. https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/copysign-simplify-demanded-bits.ll is one example. In this case it's more difficult because the obvious optimizations...