Cheap layout compatibility checks

Open AnyOldName3 opened this issue 8 months ago • 24 comments

Pull Request Template

Description

Avoids disturbing necessary descriptor sets while still avoiding the need for full layout compatibility checks.

This is achieved with a basic vector<bool> saying which sets a pipeline layout has and doesn't have, so layouts it doesn't have won't be bound. This could mean different sets are bound than if a full compatibility check was done, but that should only happen when the scenegraph requested invalid API usage - there should only be one BindDescriptorSet for each slot at the top of the state stacks, and if it's for a slot the current pipeline statically uses, it must be compatible. Therefore, we only need to worry about compatibility checks for slots the current pipeline layout doesn't use, which the change here now does.

There's a slight caveat here - if a pipeline doesn't statically use any bindings in a descriptor set, it's legal to bind nothing to it, so someone might make a scenegraph that leaves its BindDescriptorSet out of a stategroup and inherets something incompatible. If a descriptor set is empty, it can't be statically used, so that's treated as if there's no descriptor set. Any handling beyond that would have a larger performance overhead, so I think use cases other than this aren't worth supporting directly. If someone wants to avoid binding useless descriptor sets, there are other means, e.g. adding a dummy statecommand or not having a pointless descriptor set in the first place.

This is based on https://github.com/vsg-dev/VulkanSceneGraph/pull/1438, even though technically it could be separated and used on its own. This is because my earlier attempts to fix the problem relied on that PR, and using the same baseline made comparing them more straightforward.

This fix is by far the fastest that helps every failure case I identified (before yesterday), with the performance impact of a worst-case scenario ranging from below the noise floor of my measurements to just 0.25% overhead depending on whether I cherry-pick a good run or my worst run. For a scenegraph that isn't designed specifically to make this have as large an impact as possible, it should be free - it does a few instructions extra work for each pipeline layout change and descriptor set binding, both of which take much longer than that already.

Fixes #1394

Type of change

Please delete options that are not relevant.

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

How Has This Been Tested?

The main interesting test is the incompatible layout example here: https://github.com/vsg-dev/vsgExamples/compare/master...AnyOldName3:vsgExamples:incompatible-layouts. As discussed in #1394, it replicates the problem I initially discovered in a client app and several related situations that I discovered and thought were likely to have problems. There's lots of detail in this post https://github.com/vsg-dev/VulkanSceneGraph/issues/1394#issuecomment-2762170208. With the master branch, lots of validation errors are emitted in most test cases, but with this branch, it causes no errors at all.

Obviously, I've confirmed that this fixes the problem in the original client app.

As well as this, I've done performance testing. On my machine, the overhead is much smaller than the variation in framerate or execution time from run to run, so simply timing things didn't provide meaningful results. Instead, I found that running for a couple of minutes with a sampling-based profiler (I used the one built into Visual Studio) and comparing the percentage of samples in the modified functions was much more consistent. To get a worst-case scenario and maximise the observable overhead, I generated a scenegraph where the same basic cube is drawn thousands of times cycling between 32 different materials, with the number of draw commands chosen to give around a hundred frames per second. Trying fewer draws and more frames just made the noise from slow functions (particularly vkCmdBeginRenderPass and vkBeginCommandBuffer) overwhelm the overhead I was trying to measure, but trying a smaller number of slower frames meant they were called less and countered this out. Other variations on this style of scenegraph produced equivalent results. As mentioned above, if I cherry-pick the fastest baseline result and slowest result from this branch, and compare the number of samples in modified functions, that only accounts for 0.25% of all samples. If I don't artificially try and make things look as bad as possible, then the difference is too small to measure.

Checklist:

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works. They're on a vsgExamples branch - now the problem's fixed, it's not necessarily worth keeping the test app around.
[ ] New and existing unit tests pass locally with my changes
[x] Any dependent changes have been merged and published in downstream modules

May 14 '25 18:05 AnyOldName3