Enqueueing a kernel with a reqd_work_group_size with a NULL local_work_size
We don't seem to have any coverage for the case where a kernel has a required work-group size:
__attribute__((reqd_work_group_size(2, 3, 4)))
kernel void test(...) { ... }
And, the ND-range when enqueueing the kernel passes NULL as the local work-group size:
clEnqueueNDRangeKernel(
queue, kernel, work_dim, global_work_offset, global_work_size, NULL,
num_events_in_wait_list, event_wait_list, event);
In this case, I believe the implementation should use the required work-group size as the work-group size, though the spec isn't as clear about this as I'd like. We may want to clarify the spec, also.
The non-uniform work-groups test suite has some provisions for testing a required work-group size and a local work-group size equal to NULL, but this specific combination does not appear to be tested.
The API test for "kernel_required_group_size" executes a kernel, but passes an explicit local work-group size.
The API test for "kernel_attributes" only compiles kernels and never executes them.
Testing notes:
- We should test cases where the passed-in
work_dimis 1, 2, and 3.- If the passed-in
work_dimis 2 then the third component (Z) of the required work-group size must be 1. - If the passed-in
work_dimis 1 then both the second and third components (Y and Z) of the required work-group size must be 1.
- If the passed-in
- We should test both the command-queue path (
clEnqueueNDRangeKernel) and the command-buffer path (clCommandNDRangeKernelKHR). - We should also test that the required work-group size is returned by
clGetKernelSuggestedLocalWorkSizeKHR, for implementations that supportcl_khr_suggested_local_work_size.
Discussed in the August 26th teleconference. We agree that this should be valid, and hence we should have testing for this scenario.
Added mobica-backlog.
I had a bit of extra time so I wrote a quick test for clEnqueueNDRangeKernel and clGetKernelSuggestedLocalWorkSizeKHR. I need to clean it up a bit, but I think it will be sufficient for these scenarios, and I don't want to duplicate work.
I haven't done any testing via command buffers yet.