`queue::submit` doesn't Enqueue Kernels before It Returns/after Host Tasks are Complete under OpenCL Backend sometimes

Open HPS-1 opened this issue 1 month ago • 0 comments

Describe the bug

This bug was discovered when we were trying to fix a test issue: out_of_order_queue_status_khr_empty.cpp is failing flakily on OpenCL machines. This test checks the functionality of queue::khr_empty(), which returns true if the calling queue is empty and false otherwise, for out-of-order queues. And the test code can be broken down into the following steps:

First, it sets all elements in array X[100] to be 99;
Then it adds 1 to all elements in array X. Notice that this is done using host tasks.
Now that X is an array of 100 elements, each with a value of 100, it copies the contents of array X to array Y[100].
Then in a for loop, it enqueues 5 kernels to double all elements in Y in groups of 20: Y[0-19], Y[20-39], Y[40-59], Y[60-79], and Y[80-99].
After waiting for a short time, the tests calls Q.khr_empty(). If khr_empty() returns true, then it means that all kernels are supposed to be enqueued and completed. In this case, it checks if all elements in Y have been doubled to 200.

And here comes the problem: sometimes (see below for instructions on how to reproduce the issue) the test fails and complains that the values of Y's elements are not all 200. This means that khr_empty() returned true but the kernels' tasks of doubling the elements of Y haven't been finished yet. We have discussed a bit further around this in this PR: https://github.com/intel/llvm/pull/20663 In short words, this is likely a bug in the runtime that, when there're host tasks running and under OpenCL backend, queue::submit() doesn't necessarily enqueue the kernels before it returns. In fact, even after all host tasks have been completed, the kernels may still not be enqueued.

To reproduce

First go to the llvm directory on your device, and then run:

clang++  -Werror -fsycl -fsycl-targets=spir64  ./sycl/test-e2e/Basic/out_of_order_queue_status_khr_empty.cpp -o ./sycl/test-e2e/Basic/out_of_order_queue_status_khr_empty.cpp.tmp.out

to build the test. Then run:

for i in {1..100}; do ONEAPI_DEVICE_SELECTOR=opencl:cpu  ./sycl/test-e2e/Basic/out_of_order_queue_status_khr_empty.cpp.tmp.out & done

to run 100 instances of the test. You should see a bunch of assertion failures. Try increase 100 to 1000 if that doesn't work.

Here's also a revised version of the test with some more debugging printouts as discussed in the aforementioned PR: test.cpp Also notice that you can redirect the outputs of the test into a log file using command like:

for i in {1..1000}; do ONEAPI_DEVICE_SELECTOR=opencl:cpu ./sycl/test-e2e/Basic/test.cpp.tmp.out & done >>log.txt 2>&1

A few more notes:

Running the test using a level_zero:gpu device doesn't reproduce the error. So this is most likely an OpenCL-exclusive issue.
Waiting for the host tasks to be completed first (you can do so by uncommenting the sleep_for() at line 42 in test.cpp) and the error is gone. So this error likely only happens when queue::submit() is called when there are host tasks running/waiting to be run.
This error only happens when there're a lot of instances of the test are run at the same time. (Change for i in {1..100} in the command above to for i in {1..10} and the error seems to be gone). So workload on the device may also be a triggering factor.

Environment

OS: should not matter, but I was using a Linux machine
Target device and vendor: I was reproducing the error on an OpenCL CPU. but other OpenCL device might work as well
DPC++ version: -
Dependencies version: -

Additional context

No response

Dec 05 '25 00:12 HPS-1