OpenCL-Docs
OpenCL-Docs copied to clipboard
What happens on cl_sync_point_khr overflow?
Didn't find a mention of this in the command buffer base extension so here goes: cl_sync_point_khr
is defined as cl_uint
, from this follows that at most cl_uint_max-1
commands can be unambiguously recorded to a command buffer if the implementation leaves 0 as an invalid init value. On a basic x64 Ubuntu installation cl_uint
appears to be defined as a 32-bit unsigned integer, which means the maximum value is absurdly high for this purpose.
Now for the sake of argument, let's assume that a pathological application DOES record an absurd number of commands to the point of overflowing the range of unique cl_sync_point_khr
handles (maybe a platform defines cl_uint
and by extension cl_sync_point_khr
to something much smaller than 32 bits, I did not see any requirements for type sizes in the data types appendix of the main OpenCL 3.0 spec.)
What is the expected behavior in such a situation? My first thought was that attempting to record more commands should return CL_OUT_OF_HOST_MEMORY
.
maybe a platform defines cl_uint and by extension cl_sync_point_khr to something much smaller than 32 bits
I think cl_uint
is always defined as 32-bits - https://github.com/KhronosGroup/OpenCL-Headers/blob/main/CL/cl_platform.h#L155 / https://github.com/KhronosGroup/OpenCL-Headers/blob/main/CL/cl_platform.h#L261, and we define #define CL_UINT_MAX 0xffffffffU
But I agree, introducing some sort of error for when this limit is exceeded makes sense, rather than specifying some behaviour that allows cl_sync_point_khr
to be ambiguous.
But I agree, introducing some sort of error for when this limit is exceeded makes sense...
Serious question, do we really expect we may overflow sync points anytime in the near or distant future, given that a cl_uint
is 32 bits? Even if a command in a command buffer is only a handful of bytes - and they would probably be quite a bit more than this in practice - a command buffer that overflows sync points would be very, very large (many gigabytes).
I suppose we could define an error behavior, or choose a 64-bit type like size_t
or cl_ulong
, but even with a 32-bit cl_uint
it seems like we we'll run out of other resources far before they overflow.
Serious question, do we really expect we may overflow sync points anytime in the near or distant future
This is definitely an edge case that would be more relevant in an "OpenCL SC" if that existed and even there it should be very hard to trigger. Should.
Then again I don't think anything is preventing implementations from reserving some of those bits to represent arbitrary internal data so long as each sync point remains unique? This is something I only thought of today, but I actually have a use case where that might turn out to be a helpful thing to do. I'll have to experiment with that for a bit.
Then again I don't think anything is preventing implementations from reserving some of those bits to represent arbitrary internal data so long as each sync point remains unique?
Interesting question, I could definitely see cases where this might happen. I haven't reserved any bits for my command buffer emulation layer just yet, though I did reserve sync point zero to help catch potential bugs.
Would it be helpful to define some minimum number of commands that all implementations must support in a command buffer to at least bound e.g. the number of bits an implementation could reserve for internal data? We could even add a test for this, assuming we can pick a number that's large enough to be useful yet small enough so implementations do not run out of some other resource first.
We could have a device query for the max number of commands in a command-buffer, the upper bound on what that query returns would be CL_UINT_MAX
at the moment. I'm not sure how to pick a value for a lower bound in a way that isn't arbitrary, but we could do that for full profile and have no lower bound for embedded profile.
A query for a device maximum seems a little artificial since (on some implementations at least) the maximum will be limited by available resources in the system and not some fixed number. My suggestion was more to guarantee that a command-buffer must (should?) be able to hold at least N commands (including sync-point dependencies) before running out of resources, for some value of N. N could be different for the embedded profile and the full profile, and we could have a CTS test to verify this behavior.
Trying to recap some points that were raised on the October 4th OpenCL teleconference when this issue was discussed:
-
A command-buffer could contain commands targetting more than one device, so any querying method would need to take that into account.
-
A query is useful when an application passes a command-buffer into a library, as it can be used to communicate how many commands are totally available to a command-buffer, and how many are left. If there's not enough space left for the library to do what it wants to, then the library can error out before trying to record commands, as running out of space is not a recoverable error if it starts recording commands and fails.
-
How does Vulkan handle recording commands if an application runs out of space in the allocated command-pool? Couldn't see anything in the spec that defines what happens, but we should look closer into this and at Vulkan-SC, as well as other APIs using command-buffer like abstractions.
-
Not all commands take up the same amount of space, if the runtime allocates a certain amount of memory on command-buffer creation it's hard to predict how many commands will fit in that allocation. E.g a kernel command could take up more space than a buffer copy command.
-
Useful to cap the total number of commands from an application's perspective. If you have an ML model with a large number of kernels, knowing the max command-buffer size lets the application plan how to break those commands up into a smaller number of recording.
Had a look at a couple other APIs for context, and interestingly for DX12 and Vulkan the command recording entry-points don't report errors at all, they have a void
return parameter. Error reporting, such as out-of-resources, happens on the equivalent of clFinalizeCommandBufferKHR
instead.
DX12
DX12 command list errors - "Most APIs on ID3D12GraphicsCommandList do not return errors. Errors encountered during command list creation are deferred until ID3D12GraphicsCommandList::Close."
Vulkan
vkEndCommandBuffer defines returned error codes for VK_ERROR_OUT_OF_HOST_MEMORY
& VK_ERROR_OUT_OF_DEVICE_MEMORY
Vulkan however does have a limits device query which covers things like maxBoundDescriptorSets
that would restrict the amount of resources you could use in a command-buffer.
How does Vulkan handle recording commands if an application runs out of space in the allocated command-pool? Couldn't see anything in the spec that defines what happens, but we should look closer into this and at Vulkan-SC, as well as other APIs using command-buffer like abstractions.
I came across some Vulkan-SC information today and thought of this issue, there are some points from their Phoenix F2F discussion on pages 2 & 3 that are relevant to this.
In the Vulkan-SC spec itself the following physical device queries are provided:
-
VkPhysicalDeviceVulkanSC10Properties::maxCommandPoolCommandBuffers
- is the maximum number of command buffers that can be allocated from a single command pool. -
VkPhysicalDeviceVulkanSC10Properties::maxCommandBufferSize
- is the maximum supported size of a single command buffer in bytes. Applications can usevkGetCommandPoolMemoryConsumption
to compare a command buffer’s current memory usage to this limit.
The following struct and function are also available:
-
VkCommandPoolMemoryReservationCreateInfo
- This struct is used when creating a command pool to set the maximum number of command buffers that can be allocated from the pool, and the maximum combined size of those command buffers. -
vkGetCommandPoolMemoryConsumption
- This function is used to query the actual memory consumption of a command buffer at runtime.