level-zero
level-zero copied to clipboard
[Question] Blocking calls for data transfers and kernel launch?
In the spec, it seems data transfers using this function
zeCommandListAppendMemoryCopy
are not blocking. Is there any variant for blocking calls? Or is there any equivalent to the OpenCL call clEnqueue{Read/Write}Buffer
with CL_TRUE
to indicate a blocking call?
Similar to the kernel launch, is there any way to specify a blocking call?
What I have found is to close a command lists, and then launch all pending command within the list after each data transfer or kernel launch. For example, by running the following sequence:
zeCommandListAppendMemoryCopy( .. )
zeCommandListClose( .. )
zeCommandQueueExecuteCommandLists ( .. )
zeCommandQueueSynchronize ( ..)
zeCommandListReset ( .. )
but is there any other way to get blocking calls?
We don't have an exact equivalent of clEnqueue{Read/Write}Buffer
, but there are a few different ways to get the behavior you are looking for. Have you tried immediate command lists in synchronous mode? Something like...
ze_command_queue_desc_t desc = {};
desc.mode = ZE_COMMAND_QUEUE_MODE_SYNCHRONOUS
zeCommandListCreateImmediate(..., &desc, &hCommandList);
zeCommandListAppendMemoryCopy(... hCommandList)
That might work for me. Looking at the spec, immediate command lists are used for low latency, so I think they are even a better fit for what I am looking for. So, with the immediate command lists, using your example:
ze_command_queue_desc_t desc = {};
desc.mode = ZE_COMMAND_QUEUE_MODE_SYNCHRONOUS
zeCommandListCreateImmediate(..., &desc, &hCommandList);
zeCommandListAppendMemoryCopy(... hCommandList)
// >>>>>>> At this point of execution, is there any guarantee that the copy is finished?
Just to give you the context I am working on, I am writing a wrapper for Java, and I need to do blocking calls, otherwise, the Java GC can/might move the objects before the actual copy (data transfer).
Yeah, in synchronous mode, zeCommandListAppendMemoryCopy(... hCommandList)
should block until execution completes.
Per the spec:
ZE_COMMAND_QUEUE_MODE_SYNCHRONOUS = 1 Device execution always completes immediately on execute; Host thread is blocked using wait on implicit synchronization object