OpenCL-CTS icon indicating copy to clipboard operation
OpenCL-CTS copied to clipboard

Enqueueing a kernel with a reqd_work_group_size with a NULL local_work_size

Open bashbaug opened this issue 4 months ago • 2 comments

We don't seem to have any coverage for the case where a kernel has a required work-group size:

__attribute__((reqd_work_group_size(2, 3, 4)))
kernel void test(...) { ... }

And, the ND-range when enqueueing the kernel passes NULL as the local work-group size:

clEnqueueNDRangeKernel(
    queue, kernel, work_dim, global_work_offset, global_work_size, NULL,
    num_events_in_wait_list, event_wait_list, event);

In this case, I believe the implementation should use the required work-group size as the work-group size, though the spec isn't as clear about this as I'd like. We may want to clarify the spec, also.

The non-uniform work-groups test suite has some provisions for testing a required work-group size and a local work-group size equal to NULL, but this specific combination does not appear to be tested.

The API test for "kernel_required_group_size" executes a kernel, but passes an explicit local work-group size.

The API test for "kernel_attributes" only compiles kernels and never executes them.

Testing notes:

  • We should test cases where the passed-in work_dim is 1, 2, and 3.
    • If the passed-in work_dim is 2 then the third component (Z) of the required work-group size must be 1.
    • If the passed-in work_dim is 1 then both the second and third components (Y and Z) of the required work-group size must be 1.
  • We should test both the command-queue path (clEnqueueNDRangeKernel) and the command-buffer path (clCommandNDRangeKernelKHR).
  • We should also test that the required work-group size is returned by clGetKernelSuggestedLocalWorkSizeKHR, for implementations that support cl_khr_suggested_local_work_size.

bashbaug avatar Aug 25 '25 18:08 bashbaug

Discussed in the August 26th teleconference. We agree that this should be valid, and hence we should have testing for this scenario.

Added mobica-backlog.

bashbaug avatar Aug 26 '25 16:08 bashbaug

I had a bit of extra time so I wrote a quick test for clEnqueueNDRangeKernel and clGetKernelSuggestedLocalWorkSizeKHR. I need to clean it up a bit, but I think it will be sufficient for these scenarios, and I don't want to duplicate work.

I haven't done any testing via command buffers yet.

bashbaug avatar Sep 03 '25 22:09 bashbaug