GPUifyLoops.jl
GPUifyLoops.jl copied to clipboard
Fuse multiply-add
LLVM needs to know that fadd fast so that the MulAdd pass can do it's thing. How do we use fma without making the code ugly.
Add MuladdMacro.jl to their code?
Might be fixed by https://github.com/vchuravy/GPUifyLoops.jl/pull/55