gamescope
gamescope copied to clipboard
steamcompmgr, wlserver: fix various data races found w/ thread sanitizer
For testing w/ TSAN, I added on the meson flags -Db_sanitize=thread --buildtype=debugoptimized
when compiling gamescope
I've only done tests w/ running gamescope in nested mode on my X11 host desktop, so this PR might need testing on a wayland host w/ the gamescope wayland backend.
I used the following command to test the latest commit from master w/ TSAN:
TSAN_OPTIONS="suppressions=${HOME}/gamescope/suppress_mangoapp" build/src/gamescope -- vkcube |& tee gamescope_tsan.log
note that I had to suppress mangoapp_update()
(the suppress_mangoapp
files is only one line: race:mangoapp_update
) to avoid the log getting filled with messages about it
I observed at least three unique data race messages, occurring after gamescope + vkcube were actively running (so after initialization), that would reliably get triggered whenever I did the following in order:
- Cause gamescope's window(s) to go out-of-focus
- wait
- bring the window back into focus
- move the cursor around a bit
SUMMARY: ThreadSanitizer: data race ../src/wlserver.cpp:337 in bump_input_counter
SUMMARY: ThreadSanitizer: data race ../src/wlserver.cpp:2398 in wlserver_mousebutton(int, bool, unsigned int)
SUMMARY: ThreadSanitizer: data race ../src/wlserver.cpp:2379 in wlserver_mousewarp(double, double, unsigned int, bool)
Full log: https://gist.github.com/sharkautarch/4a8cf292d8b70d133904708ac29465ed
While there were a lot of TSAN warnings which could be false positives and/or non-issues, the data races addressed by this PR seemed like a bigger deal
~~EDIT:~~
~~I just realized that if (bPrevState & suspended)
should be if (!bPrevState & suspended)
OOPSIES~~
~~Will fix that soon~~
^FIXED
Also, some explanations for some of the weird things in this PR:
- I had to make
bCursorHidden
bestd::atomic<char>
in order to be able to use atomicfetch_or
. This should be safe because I made sure thatbCursorHidden
is always initialized and assignedbool
values which means that the variable will always be either 1 or 0 - ~~I use
atomic_ref
w/inputCounter
instead of a normalatomic
becauseinputCounter
is passed into an Xlib function. That means that the Xlib function is doing a non-atomic write toinputCounter
, but TSAN doesn’t seem to complain, so either it’s a false negative or the threads just never actually race at that spot… I’ve been using gcc’s thread sanitizer, so maybe this might warrant checking w/ clang’s thread sanitizer…!~~ - This PR uses a lot of acquire/release memory ordering. This is just to minimize any performance impact, since this PR adds a significant amount of atomic ops. I’m pretty sure that only two threads could simultaneously access memory (at least in the areas that this PR deals with) so, from what I understand, using acquire/release should be fine
~~EDIT: for the second bullet point, now that I think about it, Xlib is a shared library, which means that the only way for TSAN to catch races from Xlib function calls would be to rebuild Xlib w/ TSAN. So I think I’ll need to edit this PR again~~ ~~^FIXED~~ ^Just realized that XChangeProperty doesn't modify its input data (or at least I don't think it does). I updated the PR again
~~last update:
to deal with the one remaining edge case I saw, I changed all instances of~~
~~wlserver.bCursorHidden.store(!wlserver.bCursorHasImage, std::memory_order_release);
~~
to:
~~std::atomic<char> atomicTrue{true};
~~
~~atomicTrue.compare_exchange_strong(wlserver.bCursorHidden, wlserver.bCursorHasImage);
~~
~~which makes the line a single CAS RMW operation~~
~~Not sure if the change is actually necessary,~~
~~and the compare-and-exchange is definitely slower than a load+store.~~
~~So let me know whether you think I should undo that...~~
~~(edit: for now I've decided to change compare_exchange_strong
to compare_exchange_weak
, tho this only makes a difference w/ non-x86 architectures)~~
^ I undid the CAS thing because I realized it didn't work the way I was thinking it would