gamescope WSI: QueuePresent: canBypassXWayland(): Use latency hiding & reduce redundant requests/replies

trafficstars

See: https://github.com/sharkautarch/gamescope-canBypassXWayland-jitter-tests/tree/main (the canBypassXWayland-v2-tests folder has the measurements for this PR) for some tests I've done measuring duration of the canBypassXWayland() calls within GamescopeWSILayer::VkDeviceOverrides::QueuePresentKHR() the data also includes the min/max jitter (jitter being the change in the durations of consecutive canBypassXWayland() calls) for every 32 calls made

From some quick estimates off of the data from running vkcube thru gamescope (in nested mode) at 60fps, this PR reduces the median min/max (within every 32 measurements) jitter from around .60ms to ~.20ms, reduces the overall peaks in min/max jitter, and reduces the average duration from about .70ms to about .20ms

Apr 06 '24 16:04 sharkautarch

I've just added another edit to this PR, adding a XcbFetch template class functor to try to make the caching decision logic cleaner. My idea behind this is:

Caching is now fully transparent: allowing the static xcb:: helper functions to be written as if all the work is being done serially.
The XcbFetch class doesn't add any runtime overhead due to the use of the constexpr constructor (the operator () method also gets inlined by the compiler)
Bonus feature: XcbFetch abstracts away the need to separately fetch the cookie and then the reply (while, as mentioned, allowing for transparent caching)

Let me know if the new edit is too complicated or something

Apr 20 '24 16:04 sharkautarch

Note that getCachedReply does have to do a copy with memcpy (just makes caching stuff WAY simpler), but, due to copy elision/RVO, that's actually the only place that does copying. (I did a little test just to confirm that having two separate returns as done in xcb::XcbFetch::operator() does not prevent RVO: https://godbolt.org/z/zj6WGjv1c)

Apr 29 '24 01:04 sharkautarch

Another update: In addition to reducing the code size down to 93 lines, I've also removed the copies done w/ memcpy, by using what is essentially a "non-owning" unique_ptr. I did this by adding a constructor for ReplyDeleter that causes the class to not do a free if it doesn't "own" the pointer.

I know that it is odd for a unique_ptr to not own the pointer it holds, BUT: if you look at part I.30 of the c++ core guidelines, they specifically mention that you can use unique_ptrs in this way, and that it may be preferable over manual memory management

We could handle this particular example by using unique_ptr with a special deleter that does nothing for cin, but that’s complicated for novices

May 12 '24 14:05 sharkautarch

Last edits I made: I simplified stuff even more. I also switched the thread_local bool variable guarding access to g_cache to instead be a normal variable holding the owning thread's id. This should have slightly less overhead than a thread_local variable, since thread_local variables for dynamically loaded shared libraries use the general dynamic TLS model which requires running __tls_get_addr(), whereas pthread_self() just does one register move (on x86_64, and probably similar for other archs) and pthread_equal() is also pretty fast.

May 16 '24 18:05 sharkautarch

Can also confirm this PR passed a few hours of OpenGL Linux native gaming.

May 19 '24 03:05 KyunLFA

Can also confirm this PR passed a few hours of OpenGL Linux native gaming.

Uhhh I just realized that you said OpenGL unless you are using Zink or something else that translates the opengl stuff to vulkan, the gamescope's WSI layer won't be loaded because gamescope WSI is a vulkan layer.

You can check if the gamescope WSI layer is running by looking for lines from the terminal output that start with [Gamescope WSI]. Where if you don't find any of the aforementioned lines, then the gamescope WSI layer didn't get loaded. (Note that you have to wait for both gamescope and the child process that it launches to start before you'll see any message about gamescope WSI)

Here's a link with info about the vulkan loader and vulkan layers: https://github.com/KhronosGroup/Vulkan-Loader/blob/main/docs/LoaderLayerInterface.md#layer-call-chains-and-distributed-dispatch

May 19 '24 21:05 sharkautarch

@sharkautarch Oh wow yes, that was an embarrassing miss on my side, so sorry about that misdirect. I didn't realize at the moment I was writing my previous message that this PR is about a Vulkan layer only. Anyway, Vulkan microtests (vkcube, vkgears) are working. Will test with the latest changes as well, hopefully I can have the opportunity to test some real Vulkan games next.

May 20 '24 19:05 KyunLFA

@sharkautarch Oh wow yes, that was an embarrassing miss on my side, so sorry about that misdirect. I didn't realize at the moment I was writing my previous message that this PR is about a Vulkan layer only. Anyway, Vulkan microtests (vkcube, vkgears) are working. Will test with the latest changes as well, hopefully I can have the opportunity to test some real Vulkan games next.

No worries, good to hear that tests so far are going well It's okay, we all get those minor mix ups sometimes. Can't make an omelette without causing a few segfaults

May 20 '24 21:05 sharkautarch

@sharkautarch

Just the FYI, it no longer works on current git HEAD. I got it working by adding

if (!gamescopeSurface->isWayland()) {
[insert_here]
}

To the line xcb::Prefetcher prefetcher(gamescopeSurface->connection, gamescopeSurface->window);

Take a look at https://github.com/KyunLFA/gamescope/commit/40bcd1e775898db3f813c510474c1048f5bf391c if interested and if you have any reservations. Found because I carry this MR in my fork :p

Jun 11 '24 06:06 KyunLFA

Replying to https://github.com/ValveSoftware/gamescope/pull/1231#issuecomment-2159913152

Thanks, I’ll do that soon :)

Jun 11 '24 13:06 sharkautarch

@KyunLFA

I updated the branch with a fix w/ your proposed fix

if (!gamescopeSurface->isWayland()) {
xcb::Prefetcher prefetcher(gamescopeSurface->connection, gamescopeSurface->window);
}

I think that the Xcb::Prefetcher object's destructor would run after leaving the if branch, because the if branch counts as a local scope So I instead wrote a function that returns a std::optional<Prefetcher> where it constructs the Prefetcher only when the passed boolean is true. This allows the Prefetcher object to have the right lifetime

Jun 19 '24 02:06 sharkautarch

gamescope gamescope copied to clipboard

WSI: QueuePresent: canBypassXWayland(): Use latency hiding & reduce redundant requests/replies

gamescope
gamescope copied to clipboard