noma

Results 16 comments of noma

Thanks, that seems to be the issue here. Within some bounds, the compiler does vectorise the fully unrolled loop within the kernel. Performance is still poor, though. Also `-cl-fast-relaxed-math` is...

@pjaaskel Thanks for the insights. What I, in the role of an application developer using OpenCL on a SIMD-machine, would like to have is plain and simple outer loop vectorisation...

@eschnett Thanks, I wasn't aware of the second issue. With OpenMP SIMD directives, I noticed the Intel compiler exchanging the division variant of the loop with a precise reciprocal from...

@franz thanks for pointing out the OpenCL compiler options. The actual goal here is to achieve outer-"loop" vectorization across the work-items, i.e. the kernel being compiled into a sth. like...

@Kazhuu Thanks, that's very interesting, which loop exactly did you annotate? The inner loop in the kernel or the implicit outer loop, i.e. the code somewhere in PoCL that processes...

I thought about this when I added the double support, but did not want to break the API. I think a template solution only makes sense, if: - users are...