ROCm-OpenCL-Runtime
ROCm-OpenCL-Runtime copied to clipboard
local memory global used by non-kernel function
When using ROCM with OpenCL 2.0 device side enqueue, I get the following error:
error: <unknown>:0:0: in function __compute_block_invoke void (i8*): local memory global used by non-kernel function
This happens when I try to use a workgroup function in a device side enqueued kernel. For example, the following kernel gives this error:
void compute_sum(float* sum){
double s = 1;
*sum = work_group_reduce_add(s);
}
kernel void compute(float* sum){
enqueue_kernel(
get_default_queue(),
CLK_ENQUEUE_FLAGS_NO_WAIT,
ndrange_1D(1 * 64, 64),
^{compute_sum(sum);}
);
}
What I expected was that the device side enqueued kernel would run with a local workgroup size of 64, and as such, the sum variable should return 64.
Do let me know if I misunderstood something from the OpenCL specifications. Otherwise, is there a chance this might be fixed in a future release?
With kind regards,
Robbert
Unfortunately not, I stopped using kernel side enqueueing for this reason.
A workaround would be to do the program the summation yourself, but I grew rather fond of the work_group_
functions, so I stopped using the kernel side enqueue.
I hope this will be fixed in a future release.
@robbert-harms does your gpu happens to be from the new navi or rdna architecture?
@elad8a no, I have a Radeon VII (Vega 20) GPU. It seems to be a broader issue than just navi/rdna.
@robbert-harms , check out the response from AMD, hopefully we will have a fix soon..
https://community.amd.com/thread/251691
Great news! Thanks for notifying me. This issue has been a showstopper for me, I hope the fix works, else workgroup functions are basically useless (in dynamic kernels).