mpv icon indicating copy to clipboard operation
mpv copied to clipboard

wayland_vk: use FIFO if commit-timing and fifo protocols are available

Open Dudemanguy opened this issue 1 year ago • 9 comments

A very long time annoyance with wayland was compositors indefinitely blocking our vo thread if the surface gets occluded in some way. We've worked around this by using mailbox and our own custom vsync function. Thankfully it looks like people are finally solving this and with these two protocols it should be possible to guarantee forward progress on vulkan which means all the workarounds we do shouldn't be needed. So we can just request fifo in this case as a default since all we want is standard vsync blocking.

It's not tested at the moment, but SDL essentially does the same thing.

Dudemanguy avatar Oct 11 '24 20:10 Dudemanguy

We should probably wait a few months after a new Mesa release is cut with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26150

llyyr avatar Oct 11 '24 20:10 llyyr

I'm hoping compositor implementations would only signal clients if it knows the drivers can actually use the protocols. But I didn't really look at how it was implemented. We still have to wait for the next wayland-protocols release anyway.

Dudemanguy avatar Oct 11 '24 20:10 Dudemanguy

I'm hoping compositor implementations would only signal clients if it knows the drivers can actually use the protocols.

There is no way for compositors to do this. I've ranted at length about this here: https://github.com/libsdl-org/SDL/pull/10937#issuecomment-2384953649

I also don't see why the compositor should be in charge of telling an application what features a library supports. If application A wants to know what features library B supports, then it should ask library B instead of compositor C.

mahkoh avatar Oct 12 '24 08:10 mahkoh

Hmm that's unfortunate. I guess that means this will just have to wait some arbitrary time after the next mesa release. I guess the semantics of --wayland-disable-vsync should change a bit so there's a "force on" option as a fallback in case someone is on some really weird configuration like newish compositor version but old mesa.

Dudemanguy avatar Oct 12 '24 13:10 Dudemanguy

Reworked --wayland-disable-vsync to a --wayland-internal-vsync option so users have full control of the vsync behavior. Since 1.38 is out now, I bumped that as well so this is just blocked on the next mesa release as far as I'm concerned. We'll see what the final implementation will be but for now I also made using fifo depend on having wp_presentation_v2.

Dudemanguy avatar Oct 12 '24 15:10 Dudemanguy

It looks like the mesa implementation will work with presentation time v1 so I loosened up the restriction a bit.

Dudemanguy avatar Oct 15 '24 23:10 Dudemanguy

It looks like the mesa implementation will work with presentation time v1 so I loosened up the restriction a bit.

it'll work but only for fixed refresh rate outputs. It'll go back to the old behavior as soon as VRR is enabled. See: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26150/diffs?commit_id=e7aa09688bf20a5b4e664437f302762be606c783#8b9326d5e797a10c9d7bf33ddb5420c06e700768_2135_2270

implementing v2 is trivial for any compositor that'll go through the trouble of implementing fifo-v1 and commit-timing-v1, so I don't see any issue with requiring all 3

llyyr avatar Oct 16 '24 07:10 llyyr

Added it back in.

Dudemanguy avatar Oct 16 '24 12:10 Dudemanguy

The latest mesa supports this now, so this should be fairly safe to merge once we confirm a compositor implementation works as expected.

Dudemanguy avatar Nov 23 '24 19:11 Dudemanguy

This should wait until at least one of Mutter/Kwin/wlroots have finalized and merged their implementations.

llyyr avatar Dec 11 '24 17:12 llyyr

This should wait until at least one of Mutter/Kwin/wlroots have finalized and merged their implementations.

Do you expect any modifications on mpv side? For me it looks like we are dependent on protocol, so not sure why some implementations should block us.

kasper93 avatar Dec 11 '24 17:12 kasper93

Nothing for us should change. But still I would prefer to make sure it works as expected before merging.

Dudemanguy avatar Dec 11 '24 17:12 Dudemanguy

Do you expect any modifications on mpv side? For me it looks like we are dependent on protocol, so not sure why some implementations should block us.

I tested the wlroots implementation and it is objectively worse than mpv doing vsync internally. With this MR, mpv will default to using fifo/commit-timing if available which could be worse than the status quo.

If we have itchy fingers and want to merge it, please make it default to "yes" for now and change the default to auto later when we can test the finalized version of these implementations aren't worse than what mpv does.

llyyr avatar Dec 11 '24 17:12 llyyr

Adding on-ice label as it is blocked by compositor implementations apparently.

kasper93 avatar Dec 11 '24 17:12 kasper93

Do you expect any modifications on mpv side? For me it looks like we are dependent on protocol, so not sure why some implementations should block us.

I tested the wlroots implementation and it is objectively worse than mpv doing vsync internally. With this MR, mpv will default to using fifo/commit-timing if available which could be worse than the status quo.

If we have itchy fingers and want to merge it, please make it default to "yes" for now and change the default to auto later when we can test the finalized version of these implementations aren't worse than what mpv does.

I don't want to start a philosophical discussion, but my point is that protocols and specifications exist for a reason (we implement according to protocol, not specific compositor). If a compositor's implementation is incomplete or bugged, it is not mpv's issue, nor should it be a concern for us in any way (it's on them to enable when ready). mpv provides a way to disable the feature if users encounter problems. Of course, I agree that waiting is a valid option, and I'm not pushing to merge anything, but I want to point out that mpv shouldn't be hindered by third-party issues. Also, delaying features only postpones testing in the wild, which could otherwise happen sooner. Imagine that compositor developers could use mpv to validate their implementation, but we are shifting the responsibility the other way around willingly.

kasper93 avatar Dec 11 '24 18:12 kasper93

my point is that protocols and specifications exist for a reason (we implement according to protocol, not specific compositor).

I agree completely, but we're not talking about specific compositor here. No compositor is implementing is correctly (or at all) in a released version right now. There are also edge cases with the protocol that need to be ironed out and we don't know how it might affect mpv (though it shouldn't). Once we default to using fifo/commit-timing, we lose control over changing the default until the next release cycle which in mpv terms means 6 months.

I want to point out that mpv shouldn't be hindered by third-party issues.

Feel free to merge this, I'm not blocking it. My only concern was that it could potentially be a regression but if we don't care about it then LGTM

Imagine that compositor developers could use mpv to validate their implementation, but we are shifting the responsibility the other way around willingly.

You already can without this PR, with --wayland-disable-vsync=no --vulkan-swap-mode=fifo. Test clients [1] [2] [3] also exist for this specific purpose. This PR only makes it use fifo by default.

Anyone building a compositor to test this in the wild will be fine passing two arguments to mpv.

llyyr avatar Dec 11 '24 19:12 llyyr

No compositor is implementing is correctly (or at all) in a released version right now.

If you know of a bug in https://github.com/mahkoh/jay feel free to open an issue. That being said, I don't see how fifo could ever have better results than the existing mailbox code.

mahkoh avatar Dec 11 '24 19:12 mahkoh

If you know of a bug in https://github.com/mahkoh/jay feel free to open an issue.

Occluded mpv surface progressed completely unthrottled on Jay, but that was last month and I haven't tested since then. I intended to report this but it slipped my mind. I'll open an issue if it's still a problem tomorrow or day after. thx for the work.

llyyr avatar Dec 11 '24 19:12 llyyr

Occluded mpv surface progressed completely unthrottled on Jay

Hmm, that is surprising. In my implementation I have diverged a bit from the mainstream and made it so that even wp-fifo alone should be enough to throttle presentation to 40hz.

When I tested this mpv branch, I believe that I also confirmed that the throttling was working correctly. I have repeated this test just now with vkcube-wayland and it looks like it is still working:

~$ WAYLAND_DEBUG=1 vkcube-wayland 2>&1 | rg --line-buffered set_barrier | sed -u -E 's/^(.*?).......\].*/\1/' | uniq -c
     44 [3826
     40 [3827
     40 [3828
     40 [3829
     40 [3830
     40 [3831
     40 [3832
     40 [3833

I extract the "second" component from the wayland debug output and count how many wp_fifo.set_barrier events occur with the same timestamp.

I'll open an issue if it's still a problem tomorrow or day after. thx for the work.

Please do if you have a reproducer.

mahkoh avatar Dec 11 '24 19:12 mahkoh

Please do if you have a reproducer.

I just tried it and this is a mpv bug related to suspended event. When mpv is suspended, it stops drawing new frames. I don't think this is appropriate when using fifo but I'm not sure how to fix it. This works fine when using legacy fifo or mailbox, but for some reason not with fifo-v1. mpv basically stops receiving any events in WAYLAND_DEBUG when it's not drawing frames when using fifo-v1.

Jay works perfectly if you specify --force-render or ignore the suspended xdg_toplevel event.

I'll let @Dudemanguy figure out what the right fix here is

llyyr avatar Dec 11 '24 22:12 llyyr

Well we certainly deliberately not draw any frames when mpv is suspended. But that shouldn't cause the surface to be permanently blocked. Or at least as far as I know. And when mpv is brought back into view, it should receive callbacks again, be visible and draw frames again. At least that's how it should work but maybe fifo + commit-timing changes this model in a way that makes this not viable.

Dudemanguy avatar Dec 11 '24 22:12 Dudemanguy

But that shouldn't cause the surface to be permanently blocked. Or at least as far as I know. And when mpv is brought back into view, it should receive callbacks again, be visible and draw frames again.

It does, however during that time mpv thinks it's progressing unthrottled with video-sync=display-resample, so it's possible to be a bug entirely with mpv. If mpv is "suspended" for 30 seconds with display-resample, it'll "present" frames fast enough to reach eof on a 24 minute file.

llyyr avatar Dec 12 '24 08:12 llyyr

Does that mean we have to throttle ourselves if we aren't drawing?

Dudemanguy avatar Dec 12 '24 14:12 Dudemanguy

Does that mean we have to throttle ourselves if we aren't drawing?

Answering my own question here, but it's "yes". I forgot that I encountered similar behavior on x11 (didn't throttle then but just rendered anyway).

Dudemanguy avatar Dec 12 '24 18:12 Dudemanguy

Fixed a bug where I forgot to default initialize wayland-internal-vsync to auto. mutter on master supports fifo and commit-timing. It just needs a small patch to advertise presentation time v2 (https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4227). With some testing, it seems to work fine.

Dudemanguy avatar Jan 23 '25 22:01 Dudemanguy

Since this Mesa MR, commit timing is no longer explicitly required so I have dropped that part of the check. Jay and Mutter work totally fine, so the only thing potentially blocking this is whatever happens on nvidia.

Dudemanguy avatar Feb 06 '25 21:02 Dudemanguy

So I did have a user report better behavior when directly using FIFO and bypassing our internal vsync thingy. It wasn't technically with mesa's newer fifo code path (the legacy one with frame callbacks) but I still trust that over my code. So mesa users > nvidia and this goes in. If something bad happens on nvidia, you can use --wayland-interval-vsync=yes and/or --vulkan-swap-mode=mailbox.

Dudemanguy avatar Feb 17 '25 17:02 Dudemanguy