Make dark/light work, fix race conditions
We only used the dark theme while the code already contained some preparation to respect the user settings of whether apps should appear dark or light.
While working on that, I realized that window messages and hooked events are processed asynchronously which results in quite some race conditions making the app unstable.
Besides of that I also tried to get the GDI drawing a little less brittle. Not sure if I understood the manpages correctly though.
5793ce2 is for ...
before:
after:
I don't get why it didn't draw the correct background. However, ...
[...] If hbrFlickerFreeDraw is NULL, the system draws the icon or cursor directly into the device context.
You don't need to worry about it since it is drawn off-screen anyway.
These are nice changes. The overlapping concerns (typo, dark/light drawing, GDI cleanup, locking, double-free, dark/light message, issue #15) makes it hard for me to review.
The purported race conditions surprise me. Your locks presume that WM_PAINT is sent to the switcher window outside of the keyboard hook. Did you confirm this? I'd love some reproduction steps for the races these locks prevent before accepting them.
Prompted by your remarks I'm reviewing thread interaction.
There are (unfortunately) three or four threads at play in cmdtab: The normal window/message loop, the keyboard hook, and two event hooks.
The two event hooks both simply call OnShellWindowActivated, which calls AddToHistory. If these two hooks are served by different threads then they will cause contention. I'm not sure how exactly SetWinEventHook schedules things. Maybe this is the cause of #16?
The keyboard hook is meant to drive the window, but I think you might be right that the keyboard hook ends up evoking the message thread, causing races.
I don't like the locking solution, but I haven't thought about how to restructure the hooks to avoid races. Refactoring the keyboard hook was at the top of my list before I put cmdtab aside.
It's not as much overlapping as you may think. Perhaps I should've explained better where I'm coming from.
The aim of this PR is to make switching of light/dark mode happen to work in a proper way. As an example, I have a little tool running that updates the settings depending on the local sunrise/sunset times. To get cmdtab responding to these updates, we need to process the WM_SETTINGCHANGE message and read the new setting from the registry. At that point I realized that the name of the registry value is wrong which leads to always fall back to dark, no matter what. So I had to update it. (The simplification of the function code is a side show.)
Receiving the WM_SETTINGCHANGE and the creation of a new off-screen window may occur in parallel. At that point I realized that we need to synchronize writing and reading of Config.darkmode to prevent data races. This is where the lock originally came into play.
While testing back and forth I realized that I didn't get any GUI approx. every tenth restart of cmdtab. Alt+Tab still switched to another window, though. I checked out main again, to see if it was caused by the updates I made so far. I've been still facing the same bug with the original code though. So, this may or may not be another appearance of #15.
Still to make light/dark mode happen to work reliably, I tried to find the culprit. That's the point where I updated the GDI code, where I enclosed more code into a critical section, and where I considered to initially create the off-screen drawing right after the window creation. The latter eventually fixed the bug for me.
However, you're asking why I believe the old code does already contain race conditions. It's the way how WM_PAINT is promoted to the window proc.
https://learn.microsoft.com/en-us/windows/win32/winmsg/about-messages-and-message-queues#queued-messages
[...] The WM_PAINT message, the WM_TIMER message, and the WM_QUIT message, however, are kept in the queue and are forwarded to the window procedure only when the queue contains no other messages. In addition, multiple WM_PAINT messages for the same window are combined into a single WM_PAINT message, consolidating all invalid parts of the client area into a single area. [...]
In other words, you can't predict when WM_PAINT is processed. So you may be about to change the off-screen drawing while BitBlt still copies it. Typically you're facing the impact of a race condition only occasionally (if ever). The (un)likelihood for perceiving the race condition is not a basis to consider it not being a race condition though. You could perform some risk assessment, but IMO you don't need to be worried about using the synchronization API. EnterCriticalSection and LeaveCriticalSection are only a few nanoseconds away. EnterCriticalSection may spin very few milliseconds to acquire the lock. However, if it ever starts spinning, then it's for a reason.
#16 is a different thing. I don't know if the updates in this PR have the potential to fix it. If you find the time, just check it out.
However, as mentioned, there's still the off-by-one issue.
Even though it's pretty much off-topic here, a bit of visualization might be helpful.
The current code for reference:
// Shift all hwnds down one position and then put 'hwnd' first
if (HistoryCount < countof(History)) {
memmove(&History[1], &History[0], sizeof History[0] * ++HistoryCount);
} else {
memmove(&History[1], &History[0], sizeof History[0] * HistoryCount);
}
History[0] = hwnd;
To keep the spot on the array boundaries in the ASCII graphics, let's assume History is declared as an array with only 4 elements.
static handle History[4];
In memory it looks like this, with the asterisks being placeholders for existing but not yet used elements.
offset 0 1 2 3
+---+---+---+---+
| * | * | * | * |
+---+---+---+---+
Offset 4 would be the end of the array and marks the imaginary element one beyond the array. The address is guaranteed to exist, at the same time we are not allowed to access the memory at this position (it's not even guaranteed that the address points to any existing memory).
if branch
It looks like so, once we are at the point where we collected 3 handles a, b, c:
offset 0 1 2 3
+---+---+---+---+
| a | b | c | * |
+---+---+---+---+
Now we want to add handle x. First (due to the prefix incrementation) HistoryCount becomes 4. Then we move a block of 4 consecutive elements from offset 0 to offset 1:
offset 0 1 2 3
+---+---+---+---+
|(a)| a | b | c | *
+---+---+---+---+
Lastly we update the handle at offset 0 to value x. However, we already violated the memory before by copying an unused element one position beyond the upper boundary of the array.
offset 0 1 2 3
+---+---+---+---+
| x | a | b | c | *
+---+---+---+---+
else branch
Each of the 4 elememts contains a handle (a, b, c, d):
offset 0 1 2 3
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
Now we want to add handle x. HistoryCount is 4. So we move 4 elements from offset 0 to offset 1:
offset 0 1 2 3
+---+---+---+---+
|(a)| a | b | c | d
+---+---+---+---+
Lastly we update the handle at offset 0 to value x. However, also in that case we already failed to respect the array's upper boundary before.
offset 0 1 2 3
+---+---+---+---+
| x | a | b | c | d
+---+---+---+---+
From f3947b5:
- don't ReleaseDC in in OnWindowPaint (hmm, why?)
handle context = BeginPaint(Switcher, &ps); ---> context gets a copy of ps.hdc (or NULL)
EndPaint(Switcher, &ps); ---> EndPaint releases the display device context that BeginPaint retrieved.
ReleaseDC(Switcher, context); ---> undefined behavior since context is already released, see above
The last push got a little messy. Just to clear things up: I decided to split the original ce55d50 into 3:
- key bleed fix (0d630f4)
- maintenance/various (e2e41f4)
- groupByApp (2062c2d).