clspv
clspv copied to clipboard
Improperly sinking a barrier
Originally from https://github.com/KhronosGroup/SPIRV-Cross/issues/1511.
#define THREADS 256
#define THREADS_HALF (THREADS/2)
__kernel void main_kernel(__global float* data, __global float* output, int count)
{
__local float list[THREADS];
const int point1Index = get_global_id(1);
const int index = get_local_id(0);
list[index] = data[index];
for (int halfSize = THREADS_HALF; halfSize > 0; halfSize >>= 1)
{
if (index < halfSize)
{
const float value1 = list[index];
const float value2 = list[index + halfSize];
if (value2 < value1) list[index] = value2;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
if (!index)
{
output[point1Index] = list[0];
}
}
LLVM loop optimizations transform the loop iteration and sink the barrier into the continue target of the loop. The loop's branch condition ends up based on the local thread id so the workgroup is diverged without a merge instruction.
It is worth noting that if the barrier is moved before the if statement, clspv ends up with the barrier in the loop header.