accelerate-llvm Allow influencing Clang/LLVM options and code generation

Problem Power users who want to get the most out of their program may want to tune the optimisations done by LLVM; for example, @exaexa determined based on assembly output that LLVM's auto-vectorisation pass was not skipping epilogue vectorisation in a loop where this would have been highly beneficial. Tuning epilogue-vectorization-minimum-VF (link) made their program faster.

Relatedly, accelerate-llvm currently does not set any fast math flags in the generated LLVM IR. However, this results in (relative) slowness in some applications; most damningly, sum is not vectorised in Accelerate at the moment for this reason.

Solution It may be good for accelerate-llvm to offer a way for users to influence LLVM's optimisation passes. For the fast math flags (clang -ffast-math seems to influence only how C is lowered to LLVM IR, not how IR itself is optimised, so to get fast-math behaviour, the accelerate-llvm codegen needs to be changed), even though we could make a default decision that is different from "fully safe", the user may still want to tune this.

There are multiple possible API designs here:

Per kernel; this requires extensive additional annotation support. Robbert worked on this (see post below), but this has not yet been merged.
Per Acc program, using an additional runN variant that takes a record with various settings. It would be good to ensure that users cannot rely on this record to have a particular number of fields, so that we can add more options in the future without breaking clients. A possible design here is like Request and defaultRequest in http-client.
Per Haskell program, using additional +ACC flags, e.g. +ACC -Xclang -ffast-math -ACC, mirroring the API in clang itself for passing options to (I think!) collect2.
Per Haskell program, alternative: using an additional environment variable; the llvm-pretty branch already responds to ACCELERATE_LLVM_CLANG_PATH, and we could add e.g. ACCELERATE_LLVM_CLANG_OPTIONS="-ffast-math". I don't like this because it does not offer an obvious way to pass options containing spaces.

May 24 '25 20:05 tomsmeding

re 3, might be environment variable to allow easy unixy wrappers:

ACCELERATE_CLANG="myclang -fmy-option" ./my-accelerate-app

May 24 '25 20:05 exaexa

Thanks; added above.

May 24 '25 21:05 tomsmeding

(just happened to see this issue)

Another option would be to allow this to be set on a per-expression level, like I did here:

https://github.com/robbert-vdh/accelerate/blob/feature/force-inline/src/Data/Array/Accelerate/Annotations.hs#L309 https://github.com/robbert-vdh/accelerate-llvm/blob/f34bf674d5470300451a944f40bdb0671268f11f/accelerate-llvm/src/LLVM/AST/Type/Instruction.hs#L459 https://github.com/robbert-vdh/thesis

That way you can enable it when it makes sense to do so form a performance point of view, and leave it disabled (or disable it again for part of the program) in cases where it would cause your program to misbehave (like when you need to check for infinities and nan values).

May 24 '25 21:05 robbert-vdh

@robbert-vdh Thanks! Yes, very relevant to this issue. Added in the list above. I suspect that in terms of implementation effort, doing this on the Acc-program level or the Haskell program level is much easier, though. (I'm personally a fan of the Acc-level version with an additional runN variant.)

May 24 '25 21:05 tomsmeding

Re the per-kernel settings, is there currently some kind of API exposed that would allow folks to compile the kernels manually and assemble them into programs?

May 25 '25 11:05 exaexa

There is not; kernel construction and compilation happens automatically and the user has no control over that, currently. API suggestions welcome, but no guarantees we'll have time to implement it. :)

May 25 '25 11:05 tomsmeding

accelerate-llvm accelerate-llvm copied to clipboard

Allow influencing Clang/LLVM options and code generation

accelerate-llvm
accelerate-llvm copied to clipboard