clvk Respect VkPhysicalDeviceLimits::maxMemoryAllocationCount

When using UBOs/SSBOs for POD arguments, each kernel enqueue performs a new memory allocation, which is freed when the kernel instance completes. We don't currently track how many kernel instances are currently in flight, and so it is possible to exceed VkPhysicalDeviceLimits::maxMemoryAllocationCount allocations (the min max value for this is 4096). When this happens kernel enqueue and kernel creation commands start failing with CL_OUT_OF_RESOURCES.

To deal with this (without just failing), clvk would need to track the maximum number of allocations coming from buffers/images, kernels, and kernel instances, and stop enqueueing new kernels when this limit is hit. To avoid blocking the host thread at the point of enqueue I suspect we need to move some kernel enqueue logic (e.g. the part that causes POD buffer allocation) to the background submission thread. This may also intersect with #167.

It'd be nice to also deal with the cvk_kernel::MAX_INSTANCES limit (which I think we currently just ignore when enqueuing kernels?) with the same solution, although this is a per-kernel limit whereas the above is a global limit.

Mar 31 '20 15:03 jrprice

Alternately, use a single big buffer and dole out sub-allocations for individual launches. This transforms it into an internal memory allocation/deallocation problem.

Mar 31 '20 15:03 dneto0

I'm guessing you're hitting this in a practical application, aren't you? The move to push constants is likely to make this go away in many practical cases.

In theory, failing with CL_OUT_OF_RESOURCES is perfectly acceptable behaviour but I doubt many applications can make sense of this or have logic for retying.

On POD buffers, sub-allocating feels much more attractive than deferring the whole command building. Small buffers and images could also be allocated from a pool to make it less likely to hit the limit.

How urgently do you need a solution?

Mar 31 '20 16:03 kpet

I'm hitting this with the MACE benchmarks.

I'm happy to wait for push constants, but just wanted to flag this as a potential issue when not using them, if we keep that option alive.

Agree that a large buffer with suballocations is a good solution.

Mar 31 '20 16:03 jrprice

clvk clvk copied to clipboard

Respect VkPhysicalDeviceLimits::maxMemoryAllocationCount

clvk
clvk copied to clipboard