accelerate-llvm
accelerate-llvm copied to clipboard
Allow influencing Clang/LLVM options and code generation
Problem
Power users who want to get the most out of their program may want to tune the optimisations done by LLVM; for example, @exaexa determined based on assembly output that LLVM's auto-vectorisation pass was not skipping epilogue vectorisation in a loop where this would have been highly beneficial. Tuning epilogue-vectorization-minimum-VF (link) made their program faster.
Relatedly, accelerate-llvm currently does not set any fast math flags in the generated LLVM IR. However, this results in (relative) slowness in some applications; most damningly, sum is not vectorised in Accelerate at the moment for this reason.
Solution
It may be good for accelerate-llvm to offer a way for users to influence LLVM's optimisation passes. For the fast math flags (clang -ffast-math seems to influence only how C is lowered to LLVM IR, not how IR itself is optimised, so to get fast-math behaviour, the accelerate-llvm codegen needs to be changed), even though we could make a default decision that is different from "fully safe", the user may still want to tune this.
There are multiple possible API designs here:
- Per kernel; this requires extensive additional annotation support. Robbert worked on this (see post below), but this has not yet been merged.
- Per Acc program, using an additional
runNvariant that takes a record with various settings. It would be good to ensure that users cannot rely on this record to have a particular number of fields, so that we can add more options in the future without breaking clients. A possible design here is likeRequestanddefaultRequestin http-client. - Per Haskell program, using additional
+ACCflags, e.g.+ACC -Xclang -ffast-math -ACC, mirroring the API inclangitself for passing options to (I think!)collect2. - Per Haskell program, alternative: using an additional environment variable; the llvm-pretty branch already responds to
ACCELERATE_LLVM_CLANG_PATH, and we could add e.g.ACCELERATE_LLVM_CLANG_OPTIONS="-ffast-math". I don't like this because it does not offer an obvious way to pass options containing spaces.
re 3, might be environment variable to allow easy unixy wrappers:
ACCELERATE_CLANG="myclang -fmy-option" ./my-accelerate-app
Thanks; added above.
(just happened to see this issue)
Another option would be to allow this to be set on a per-expression level, like I did here:
https://github.com/robbert-vdh/accelerate/blob/feature/force-inline/src/Data/Array/Accelerate/Annotations.hs#L309 https://github.com/robbert-vdh/accelerate-llvm/blob/f34bf674d5470300451a944f40bdb0671268f11f/accelerate-llvm/src/LLVM/AST/Type/Instruction.hs#L459 https://github.com/robbert-vdh/thesis
That way you can enable it when it makes sense to do so form a performance point of view, and leave it disabled (or disable it again for part of the program) in cases where it would cause your program to misbehave (like when you need to check for infinities and nan values).
@robbert-vdh Thanks! Yes, very relevant to this issue. Added in the list above. I suspect that in terms of implementation effort, doing this on the Acc-program level or the Haskell program level is much easier, though. (I'm personally a fan of the Acc-level version with an additional runN variant.)
Re the per-kernel settings, is there currently some kind of API exposed that would allow folks to compile the kernels manually and assemble them into programs?
There is not; kernel construction and compilation happens automatically and the user has no control over that, currently. API suggestions welcome, but no guarantees we'll have time to implement it. :)