cutde
cutde copied to clipboard
The pocl compilation time for the ACA kernel is bizarrely long.
Most of the OpenCL kernels seem to take about 1-3 seconds to compile with pocl
. But, aca.cu
takes almost three minutes. The is true on both my machine and the github actions CI servers so I suspect it will replicate elsewhere too.
Ideas:
- Maybe the problem is just an issue with the pocl conda package and is due to some misconfigured LLVM flags. See here where this issue is suggested: http://portablecl.org/docs/html/faq.html#why-is-pocl-slow
- The problem vanishes if I turn off optimization flags using
-cl-opt-disable
. I would actually be okay leaving this flag on all the time. But, that caused some segfault errors! I'm not sure what the problem is there. - Obviously the Mako code generation is producing a lot of code, but it's not substantially more code in
aca.cu
compared to the other kernels likefree.cu
orblock.cu
. So, what is causing the compiler to burp here?