`queue::submit` doesn't Enqueue Kernels before It Returns/after Host Tasks are Complete under OpenCL Backend sometimes
Describe the bug
This bug was discovered when we were trying to fix a test issue: out_of_order_queue_status_khr_empty.cpp is failing flakily on OpenCL machines. This test checks the functionality of queue::khr_empty(), which returns true if the calling queue is empty and false otherwise, for out-of-order queues. And the test code can be broken down into the following steps:
- First, it sets all elements in array X[100] to be 99;
- Then it adds 1 to all elements in array X. Notice that this is done using host tasks.
- Now that X is an array of 100 elements, each with a value of 100, it copies the contents of array X to array Y[100].
- Then in a for loop, it enqueues 5 kernels to double all elements in Y in groups of 20: Y[0-19], Y[20-39], Y[40-59], Y[60-79], and Y[80-99].
- After waiting for a short time, the tests calls
Q.khr_empty(). Ifkhr_empty()returns true, then it means that all kernels are supposed to be enqueued and completed. In this case, it checks if all elements in Y have been doubled to 200.
And here comes the problem: sometimes (see below for instructions on how to reproduce the issue) the test fails and complains that the values of Y's elements are not all 200. This means that khr_empty() returned true but the kernels' tasks of doubling the elements of Y haven't been finished yet. We have discussed a bit further around this in this PR: https://github.com/intel/llvm/pull/20663 In short words, this is likely a bug in the runtime that, when there're host tasks running and under OpenCL backend, queue::submit() doesn't necessarily enqueue the kernels before it returns. In fact, even after all host tasks have been completed, the kernels may still not be enqueued.
To reproduce
First go to the llvm directory on your device, and then run:
clang++ -Werror -fsycl -fsycl-targets=spir64 ./sycl/test-e2e/Basic/out_of_order_queue_status_khr_empty.cpp -o ./sycl/test-e2e/Basic/out_of_order_queue_status_khr_empty.cpp.tmp.out
to build the test. Then run:
for i in {1..100}; do ONEAPI_DEVICE_SELECTOR=opencl:cpu ./sycl/test-e2e/Basic/out_of_order_queue_status_khr_empty.cpp.tmp.out & done
to run 100 instances of the test. You should see a bunch of assertion failures. Try increase 100 to 1000 if that doesn't work.
Here's also a revised version of the test with some more debugging printouts as discussed in the aforementioned PR: test.cpp Also notice that you can redirect the outputs of the test into a log file using command like:
for i in {1..1000}; do ONEAPI_DEVICE_SELECTOR=opencl:cpu ./sycl/test-e2e/Basic/test.cpp.tmp.out & done >>log.txt 2>&1
A few more notes:
- Running the test using a
level_zero:gpudevice doesn't reproduce the error. So this is most likely an OpenCL-exclusive issue. - Waiting for the host tasks to be completed first (you can do so by uncommenting the
sleep_for()at line 42 intest.cpp) and the error is gone. So this error likely only happens whenqueue::submit()is called when there are host tasks running/waiting to be run. - This error only happens when there're a lot of instances of the test are run at the same time. (Change
for i in {1..100}in the command above tofor i in {1..10}and the error seems to be gone). So workload on the device may also be a triggering factor.
Environment
- OS: should not matter, but I was using a Linux machine
- Target device and vendor: I was reproducing the error on an OpenCL CPU. but other OpenCL device might work as well
- DPC++ version: -
- Dependencies version: -
Additional context
No response