VK_ERROR_DEVICE_LOST when enabling descriptors indexing validation
Environment:
- OS: Windows 11
- GPU and driver version: AMD Radeon RX 6950 XT, 2.0.299
- SDK or header version if building from repo: 1.3.290.0
- Options enabled (synchronization, best practices, etc.): Descriptors indexing
Describe the Issue
I have two graphics pipelines - A and B. A uses sets X and Y, B uses set X. X and Y are bound to sets 0 and 1 respectively. Here's the code:
VkDescriptorSetLayout setLayouts[] { setLayoutX, setLayoutY };
VkPipelineLayoutCreateInfo pipelineLayoutCreateInfo {};
pipelineLayoutCreateInfo.pSetLayouts = setLayouts;
pipelineLayoutCreateInfo.setLayoutCount = 2;
vkCreatePipelineLayout(device, &pipelineLayoutCreateInfo, nullptr, &pipelineLayoutA);
// create pipeline A using layout A
pipelineLayoutCreateInfo.setLayoutCount = 1;
vkCreatePipelineLayout(device, &pipelineLayoutCreateInfo, nullptr, &pipelineLayoutB);
// create pipeline B using layout B
VkDescriptorSet descriptorSets[] { setX, setY };
vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayoutA, 0, 2, descriptorSets, 0, nullptr);
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineA);
vkCmdDraw(...); // uses sets 0 and 1
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineB);
vkCmdDraw(...); // uses set 0
Submitting this command buffer results in VK_ERROR_DEVICE_LOST with descriptors indexing validation enabled. There's no error with it disabled. The error can be avoided by creating pipeline B with pipeline layout A.
According to the spec my usage is fine, because layouts A and B are compatible for set 0:
Two pipeline layouts are defined to be “compatible for push constants” if they were created with identical push constant ranges. Two pipeline layouts are defined to be “compatible for set N” if they were created with identically defined descriptor set layouts for sets zero through N, and if they were created with identical push constant ranges.
When binding a descriptor set (see Descriptor Set Binding) to set number N, a previously bound descriptor set bound with lower index M than N is disturbed if the pipeline layouts for set M and N are not compatible for set M. Otherwise, the bound descriptor set in M is not disturbed.
If, additionally, the previously bound descriptor set for set N was bound using a pipeline layout not compatible for set N, then all bindings in sets numbered greater than N are disturbed.
When binding a pipeline, the pipeline can correctly access any previously bound descriptor set N if it was bound with compatible pipeline layout for set N, and it was not disturbed.
Layout compatibility means that descriptor sets can be bound to a command buffer for use by any pipeline created with a compatible pipeline layout, and without having bound a particular pipeline first. It also means that descriptor sets can remain valid across a pipeline change, and the same resources will be accessible to the newly bound pipeline.
When a descriptor set is disturbed by binding descriptor sets, the disturbed set is considered to contain undefined descriptors bound with the same pipeline layout as the disturbing descriptor set.
Expected behavior
Device not being lost.
Additional context
There are no validation errors prior to VK_ERROR_DEVICE_LOST.
@Trider12 thanks for reporting this, we are currently heavily working and fixing GPU-AV, once I get the new descriptor indexing validation setup, will come back and take a look, but hopefully it will "just be fixed" then
@Trider12 so I was able to reproduce the crash in https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull/8535 (thanks for the simple breakdown of the tests)
So I see what is happening, we are mismatching the pipeline layout underneath we use in GPU-AV and creating an invalid Vulkan flow, which causes the crash... will try hard to get in before the next SDK soon!