wgpu
wgpu copied to clipboard
Maximum dynamic offset limit is incorrect per the spec
If I'm reading this right, according to the spec, the requirement for the dynamic offset for a buffer binding is that bufferBinding.offset + dynamicOffset ≤ bufferBinding.binding.size - bufferLayout.minBindingSize. That is, with the dynamic offset applied, the buffer binding must contain at least one item.
Furthermore, according to the spec, the effective binding size is clipped to stay inside the range of the underlying buffer, if the offset would otherwise shift it beyond the end.
However, based on an error I'm getting, and from reading the code, it looks like the limit is being calculated such that the end of the dynamically-offset buffer range must stay inside the range of the underlying buffer. Based on the above, I believe this is incorrect.
In my case, this is a substantial limitation— I'm using a compute shader to work on data in batches; this would require that the underlying data buffers must be a whole number multiple of the batch size rather than merely the array element size. If the number of elements to work on is not an exact multiple of the batch size, then either significant GPU memory would be wasted, or the batch size would have to be made very small, reducing efficiency.
Relatedly, this use case also highlights another limitation, but the spec doesn't seem to address it: For much the same reasons that it might be necessary to dynamically specify the beginning of a bound buffer range, it'll often be necessary to specify the end of the range. In my case, consider the final "batch" of data items to process: If it isn't full, there is no way for me to shorten the bound array range to contain only the live items. This is solved iff the last item to process is the last item in the underlying buffer, and the above rule in the spec would require the binding to be clipped, but in my application this isn't the case: I am processing items in a tree, and the items that follow the last-processed item are live items in the next level of the tree hierarchy. I can use a uniform to early exit those items in the shader, but still the pipeline won't know that those items aren't written to, and may consider that unused part of the buffer to be aliasing other buffer bindings, which would have performance (?) or correctness implications. IMO it would be very beneficial for this to be relaxed, unless there's some intended workflow I'm missing. I can file a separate sug for that if it there's agreement.
For the benefit of any hapless google-searchers in the same predicament, the error I'm getting is:
Device error: type 1
Validation Error
Caused by:
In wgpuComputePassEncoderEnd
note: encoder = `<CommandBuffer-(0, 5, Metal)>`
In a set_bind_group command
note: bind group = `FeatureSet source bind group (depth 0)`
Dynamic binding offset index 0 with offset 1572864 would overrun the buffer bound to bind group 2 -> binding 0. Buffer size is 3110496 bytes, the binding binds bytes 0..3110400, meaning the maximum the binding can be offset is 96 bytes
Looking at the code you linked to, it does seem we are more strict than the spec. We should update our validation.
We also need to make sure to always set the binding size since we currently do use VK_WHOLE_SIZE.
VUID-vkCmdBindDescriptorSets-pDescriptorSets-06715 For each dynamic uniform or storage buffer binding in pDescriptorSets, if the range was set with VK_WHOLE_SIZE then pDynamicOffsets which corresponds to the descriptor binding must be 0
The spec seems to have some missing validation, it currently says:
bufferBinding.offset + dynamicOffsets[dynamicOffsetIndex] + minBindingSize must be ≤ bufferBinding.buffer.size.
but bufferLayout.minBindingSize can be 0.
The dynamic offset needs to be subtracted from the "effective buffer binding size" in the "Validate encoder bind groups" validation function.
I will open an issue on in the spec repo.
Relatedly, this use case also highlights another limitation, but the spec doesn't seem to address it: For much the same reasons that it might be necessary to dynamically specify the beginning of a bound buffer range, it'll often be necessary to specify the end of the range.
I don't think we can do much about this, as far as I know none of the native APIs have this functionality.
I can use a uniform to early exit those items in the shader, but still the pipeline won't know that those items aren't written to, and may consider that unused part of the buffer to be aliasing other buffer bindings, which would have performance (?) or correctness implications.
The spec requires us to validate this aliasing, so from a correctness perspective this shouldn't be an issue.
Thanks for looking at this!
I don't think we can do much about this, as far as I know none of the native APIs have this functionality.
Ah, I'm not familiar with the newer underlying APIs. It seems like the buffer length needs to be dynamically clipped for the off-end case, how is that handled? Is it implicit?
It is implicit.
I solved the validation problem by making the buffer bigger. The simplest solution was to make the buffer double the size of the binding size, and the second half was empty. Later I put other things in the second half, made the binding size the smallest possible, so it's not wasted space.