Vulkan-ValidationLayers icon indicating copy to clipboard operation
Vulkan-ValidationLayers copied to clipboard

Spurious VUID-vkCmdBindDescriptorSets-pDescriptorSets-01979(ERROR / SPEC)

Open buzmeg opened this issue 1 year ago • 3 comments

Environment:

  • OS: Ubuntu 24.04.1 LTS (Linux hostname 6.8.0-45-generic # 45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux)
  • GPU and driver version: Intel UHD Graphics 620 (WHL GT2)
  • SDK or header version if building from repo: Vulkan Instance Version: 1.3.290
  • Options enabled (synchronization, best practices, etc.): Validation

Describe the Issue When handling dynamic offsets, this warning pops up for my code in Validation:

VUID-vkCmdBindDescriptorSets-pDescriptorSets-01979(ERROR / SPEC): msgNum: -1587959965 - Validation Error: [ VUID-vkCmdBindDescriptorSets-pDescriptorSets-01979 ] Object 0: handle = 0x7fff5400fce0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xb991fa0000000024, type = VK_OBJECT_TYPE_DESCRIPTOR_SET; Object 2: handle = 0x3a6cbb0000000025, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xa159a763 | vkCmdBindDescriptorSets(): pDynamicOffsets[0] is 256, which when added to the buffer descriptor's range (2048) and offset (0) is greater than the size of the buffer (2048) in descriptorSet #1 binding #0 descriptor[0]. The Vulkan spec states: For each dynamic uniform or storage buffer binding in pDescriptorSets, the sum of the effective offset and the range of the binding must be less than or equal to the size of the buffer (https://vulkan.lunarg.com/doc/view/1.3.290.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdBindDescriptorSets-pDescriptorSets-01979)

There are only two uses of that 2048 constant in my code: the full size of the buffer for creation and the .range field of the VkDescriptorBufferInfo being fed to vkUpdateDescriptorSets to update the full size of that buffer. Both are perfectly consistent and throw no validation problems.

The problem occurs as soon as I use a non-zero in one of my pDynamicOffset values and call vkCmdBindDescriptorSets. That's when the validation layer complains.

The issue, as far as I can tell, is that the validation layer is attempting to calculate the end of descriptor and compare it to the VkBuffer size by calculating:

end_byte = .offset (from VkDescriptorBufferInfo) (0 in my case) + .range (from VkDescriptorBufferInfo) (full size of the buffer in my case) and pDynamicOffset[0]. This triggers the validation as anything + full range is bigger than full range.

This calculation is incorrect as .range (from VkDescriptorBufferInfo) is NOT the "range"/"width"/"byte stride" of the descriptor data. .range is the length to be updated by vkUpdateDescriptorSets and is NOT inherently connected to the memory size of a single instance of the descriptor data.

The problem appears to be that there is no explicit definition of descriptor data stride anywhere. That is totally implicit in how the end user adjusts the pDynamicOffset values (effectively defined by the GLSL declarations modulo minUniformBufferOffsetAlignment) . Consequently, you can specify a pDynamicOffset[0] that is right next to the .range boundary but less than the size of your descriptor data size as defined in the GLSL and happily run off the end of the VkBuffer (which is obviously an error).

Sadly, that means that there is no easy way to validate running off the end.

Attempting to make .range (from VkDescriptorBufferInfo) be the descriptor stride (which this ValidationLayer seems to want to do) would mean that you would have to vkUpdateDescriptorSets with every vkCmdBindDescriptorSets which would completely defeat the whole point of pDynamicOffsets.

Expected behavior

No validation issue.

As far as I can tell, the correct memory is being indexed. I can place a patterned sequence of data and move the pDynamicOffset around and I see the pattern in my debugPrintfEXT.

As such, the Intel graphics driver seems to concur with my interpretation rather than that of the validation layer. I have not checked against AMD or NVIDIA hardware yet.

The spec seems to be quite unclear on this. Quoting:

The effective offset used for dynamic uniform and storage buffer bindings is the sum of the relative offset taken from pDynamicOffsets, and the base address of the buffer plus base offset in the descriptor set. The range of the dynamic uniform and storage buffer bindings is the buffer range as specified in the descriptor set.

I suspect that the spec was written to mean that the effective offset must stay withing the range of the buffer. That is straightforward to check. I suspect that the intent was also that the "effective offset+byte stride of the descriptor data" was also expected to stay within the range of the buffer. That, however, is not possible to check in a straightforward way. Perhaps the SPIR-V can be read for the stride and then used against the validation.

The Vulkan problem is that nothing ever explicitly specifies the "byte stride of descriptor data". And I suspect that was intentional since pDynamicOffsets is a byte offset (stride is implicit) and not an index offset (which would need that stride defined somewhere).

I suspect that a vkCmdBindDescriptorSets2 would be required so that byte stride could be specified and pDynamicOffsets changed to a pDynamicIndicies.

Until then (or until VK_EXT_descriptor_buffer becomes part of baseline Vulkan), this validation layer needs to be reexamined.

Thanks.

buzmeg avatar Oct 04 '24 07:10 buzmeg

sorry will take a deep look next week hopefully, but can you make sure you are not using VK_WHOLE_SIZE (https://github.com/KhronosGroup/Vulkan-Guide/blob/main/chapters/descriptor_dynamic_offset.adoc) as this has been a source of confusion for many before

(only had time to skim this and this part of the spec is thick and I need to recache it in my brain)

spencer-lunarg avatar Oct 04 '24 17:10 spencer-lunarg

It also will be VERY helpful (and faster to fix) if you have a basic example of the various API calls used to get to this situation as we ultimately will need a test regardless to catch regressions and sometimes making the tests is 90% the work and the layer code is just a 2 line fix

spencer-lunarg avatar Oct 04 '24 17:10 spencer-lunarg

I am not using VK_WHOLE_SIZE (I am explicitly using the actual buffer size which is 2048) which is why I could track this down more strongly. However, I believe that this is also part of the problem with VK_WHOLE_SIZE.

The .range field is simply not meant to serve as a "stride" for vkCmdBindDescriptorSets yet the validation layers are trying to treat it as such. If that stride assumption is removed then .range only applies to the vkUpdateDescriptorSets call, and a lot of the problems with VK_WHOLE_SIZE also vaporize.

It also will be VERY helpful (and faster to fix) if you have a basic example of the various API calls used to get to this situation as we ultimately will need a test regardless to catch regressions and sometimes making the tests is 90% the work and the layer code is just a 2 line fix

That's going to be a lot harder as all of my code is in Zig and I'm already behind schedule from running this down. I would have to rip apart multi-threaded rendering code and a pretty complex shader. But, yes, I understand that the test case is more important than the fix. This may have to wait until a new batch of summer interns. :frowning_face:

I probably won't be able to get the code pulled out into an easy example for a while. I should be able to test against AMD and NVIDIA over the next couple of weeks. If I see any differences, I will report them.

buzmeg avatar Oct 04 '24 22:10 buzmeg