gamescope
gamescope copied to clipboard
WSI: QueuePresent: canBypassXWayland(): Use latency hiding & reduce redundant requests/replies
See: https://github.com/sharkautarch/gamescope-canBypassXWayland-jitter-tests/tree/main
(the canBypassXWayland-v2-tests
folder has the measurements for this PR)
for some tests I've done measuring duration of the canBypassXWayland() calls within GamescopeWSILayer::VkDeviceOverrides::QueuePresentKHR()
the data also includes the min/max jitter (jitter being the change in the durations of consecutive canBypassXWayland() calls) for every 32 calls made
From some quick estimates off of the data from running vkcube thru gamescope (in nested mode) at 60fps, this PR reduces the median min/max (within every 32 measurements) jitter from around .60ms to ~.20ms, reduces the overall peaks in min/max jitter, and reduces the average duration from about .70ms to about .20ms
I've just added another edit to this PR, adding a XcbFetch
template class functor to try to make the caching decision logic cleaner.
My idea behind this is:
- Caching is now fully transparent: allowing the static
xcb::
helper functions to be written as if all the work is being done serially. - The
XcbFetch
class doesn't add any runtime overhead due to the use of theconstexpr
constructor (theoperator ()
method also gets inlined by the compiler) - Bonus feature:
XcbFetch
abstracts away the need to separately fetch the cookie and then the reply (while, as mentioned, allowing for transparent caching)
Let me know if the new edit is too complicated or something
Note that getCachedReply
does have to do a copy with memcpy
(just makes caching stuff WAY simpler), but, due to copy elision/RVO, that's actually the only place that does copying.
(I did a little test just to confirm that having two separate returns as done in xcb::XcbFetch::operator()
does not prevent RVO: https://godbolt.org/z/zj6WGjv1c)
Another update: In addition to reducing the code size down to 93 lines, I've also removed the copies done w/ memcpy
, by using what is essentially a "non-owning" unique_ptr
. I did this by adding a constructor for ReplyDeleter
that causes the class to not do a free
if it doesn't "own" the pointer.
I know that it is odd for a unique_ptr to not own the pointer it holds, BUT:
if you look at part I.30 of the c++ core guidelines, they specifically mention that you can use unique_ptr
s in this way, and that it may be preferable over manual memory management
We could handle this particular example by using unique_ptr with a special deleter that does nothing for cin, but that’s complicated for novices
Last edits I made:
I simplified stuff even more.
I also switched the thread_local bool
variable guarding access to g_cache
to instead be a normal variable holding the owning thread's id. This should have slightly less overhead than a thread_local
variable, since thread_local
variables for dynamically loaded shared libraries use the general dynamic TLS model which requires running __tls_get_addr()
, whereas pthread_self()
just does one register move (on x86_64, and probably similar for other archs) and pthread_equal()
is also pretty fast.
Can also confirm this PR passed a few hours of OpenGL Linux native gaming.
Can also confirm this PR passed a few hours of OpenGL Linux native gaming.
Uhhh I just realized that you said OpenGL unless you are using Zink or something else that translates the opengl stuff to vulkan, the gamescope's WSI layer won't be loaded because gamescope WSI is a vulkan layer.
You can check if the gamescope WSI layer is running by looking for lines from the terminal output that start with [Gamescope WSI]
. Where if you don't find any of the aforementioned lines, then the gamescope WSI layer didn't get loaded. (Note that you have to wait for both gamescope and the child process that it launches to start before you'll see any message about gamescope WSI)
Here's a link with info about the vulkan loader and vulkan layers: https://github.com/KhronosGroup/Vulkan-Loader/blob/main/docs/LoaderLayerInterface.md#layer-call-chains-and-distributed-dispatch
@sharkautarch Oh wow yes, that was an embarrassing miss on my side, so sorry about that misdirect. I didn't realize at the moment I was writing my previous message that this PR is about a Vulkan layer only. Anyway, Vulkan microtests (vkcube, vkgears) are working. Will test with the latest changes as well, hopefully I can have the opportunity to test some real Vulkan games next.
@sharkautarch Oh wow yes, that was an embarrassing miss on my side, so sorry about that misdirect. I didn't realize at the moment I was writing my previous message that this PR is about a Vulkan layer only. Anyway, Vulkan microtests (vkcube, vkgears) are working. Will test with the latest changes as well, hopefully I can have the opportunity to test some real Vulkan games next.
No worries, good to hear that tests so far are going well It's okay, we all get those minor mix ups sometimes. Can't make an omelette without causing a few segfaults
@sharkautarch
Just the FYI, it no longer works on current git HEAD. I got it working by adding
if (!gamescopeSurface->isWayland()) {
[insert_here]
}
To the line xcb::Prefetcher prefetcher(gamescopeSurface->connection, gamescopeSurface->window);
Take a look at https://github.com/KyunLFA/gamescope/commit/40bcd1e775898db3f813c510474c1048f5bf391c if interested and if you have any reservations. Found because I carry this MR in my fork :p
Replying to https://github.com/ValveSoftware/gamescope/pull/1231#issuecomment-2159913152
Thanks, I’ll do that soon :)
@KyunLFA
I updated the branch with a fix w/ your proposed fix
if (!gamescopeSurface->isWayland()) {
xcb::Prefetcher prefetcher(gamescopeSurface->connection, gamescopeSurface->window);
}
I think that the Xcb::Prefetcher
object's destructor would run after leaving the if branch, because the if branch counts as a local scope
So I instead wrote a function that returns a std::optional<Prefetcher>
where it constructs the Prefetcher only when the passed boolean is true. This allows the Prefetcher
object to have the right lifetime