wgpu
wgpu copied to clipboard
[hal/vk] Rework Submission and Surface Synchronization
Connections
- Closes #5559
- Closes #5693
- Maybe #4919
- Maybe #5374
- Maybe #5491
- Probably not #3452.
Description
As described in #5559, our submission and surface synchronization was totally messed up. I've revamped it, made smarter classes to help keep all the bookkeeping straight, and generally make the code cleaner and easier to understand.
Testing
works on my machine lel
Works fine for me running the cube example with VK validation layers enabled
[2024-05-09T06:03:33Z INFO wgpu_core::instance] Adapter Vulkan AdapterInfo { name: "Intel(R) Arc(TM) Graphics", vendor: 32902, device: 32085, device_type: IntegratedGpu, driver: "Intel Corporation", driver_info: "101.5382", backend: Vulkan }
This fixes an annoying access violation termination (that I thought was a driver bug) on program exit after a present.
Unfortunately, it does not fix #5644 for me. Same constant validation error on presentation:
[2024-05-09T08:16:24Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkAcquireNextImageKHR-semaphore-01779 (0x5717e75b)]
Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01779 ] Object 0: handle = 0xba7514000000002a, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0x5717e75b | vkAcquireNextImageKHR(): Semaphore must not have any pending operations. The Vulkan spec states: If semaphore is not VK_NULL_HANDLE it must not have any uncompleted signal or wait operations pending (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01779)
[2024-05-09T08:16:24Z ERROR wgpu_hal::vulkan::instance] objects: (type: SEMAPHORE, hndl: 0xba7514000000002a, name: ?)
The actual validation error only happens with a debug
build, although release
feels much slower than 0.19
as well. I believe something is very slow and the errors actually only show up in debug
mode.
Flamegraph of cube
seems to indicate this—with present
taking a huge amount of time:
For comparison, same drivers with 0.19
:
Even if I enable Immediate
present mode, frames seem to take twice as much time to be rendered with 0.20
.
@hecrj could you try again with these latest commits, someone else who can reproduce the validation error says that this fixed it.
@cwfitzgerald Yep! The latest changes fix the validation errors. Thanks!
I'm still noticing a bit of worse performance in debug
mode, but it doesn't seem to be there in release
mode anymore. Flamegraphs look the same as before, but I imagine it's expected since now synchronization works properly and there may be more waiting. No errors in any case!
I'm actually rewriting these comments as I review so if you're not attached to them then don't worry too much about fixing anything.
Yeah that's fine - I was just trying to get something there.
@cwfitzgerald What do you think of this? https://github.com/gfx-rs/wgpu/commit/da543fb51
My hope was that this would make the states of RelaySemaphores
a bit clearer to a new reader. If people should not use wait
unless should_wait
is true, then let's actually force people to check that condition before they can even get at the value: classic Option
. But it does make advance
fallible and require us to pass the device.
[edit: made this a little nicer still: RelaySemaphoreState
is no longer necessary]
@cwfitzgerald Here's an even more radical suggestion: https://github.com/gfx-rs/wgpu/commit/b5d52fed45e4f89213e6fe5870e99126865711aa
Just use a single semaphore for all queue ordering.
@jimblandy i think we can go for it, the only question is if we want to still avoid this deadlock in anv mentioned in the old code, it if we consider that old enough to ignore.
/// It would be correct to use a single semaphore there, but
/// [Intel hangs in `anv_queue_finish`](https://gitlab.freedesktop.org/mesa/mesa/-/issues/5508).
I should have perseved that comment somehow.
@cwfitzgerald:
I should have perseved that comment somehow.
Ahh, yes, I was reviewing by checking out the branch and just wandering around with rust-analyzer, so I didn't notice that the patch removed the comment. If that's the sole rationale for the entire semaphore-swapping doohickey then we had definitely better have the comment.
Kvark filed that bug in Oct 2021. That's not that long ago, so I think we'd better keep it. It sounds like a bear to debug, and I'd rather not have to debug it again.
So I guess the remaining question is what you think of this one.
So here's the latest version of my suggestion: https://github.com/jimblandy/wgpu/compare/hal-vk-wait-semaphore-is-option%7E2...jimblandy:wgpu:hal-vk-wait-semaphore-is-option
Looks fine to me, could you either PR to my fork or just push to it?
I still want to get a handle on the swapchain semaphores. I've got some work-in-progress docs for those too. I'll take care of this tomorrow morning.
@cwfitzgerald I just pushed two doc commits. Could you look them over for accuracy?
rebased on current trunk