os_thread_windows.c - get_rel_wait() will block if abstime is in the past
ISSUE: os_thread_windows.c - get_rel_wait() will block if abstime is in the past
Environment Information
n/a
Please provide a reproduction of the bug:
Code review
How often bug is revealed: (always, often, rare):
n/a
Actual behavior:
get_rel_wait() gets its abs_time argument from os_cond_timedwait(), which in turn receives it from the external code. So it appears to be possible to pass abs_time that is in the past, causing get_rel_wait() to end up subtracting larger value from a smaller one and returning a (very large) unsigned int.
Expected behavior:
Stumbled upon this randomly by skimming through the changes in pull request 4942, which was linked from the Static code analysis of the PMDK blog post, which in turn was mentioned in this tweet by one of the PVS-Studio devs.
I would guess that get_rel_wait() hanging for a very long time when passed a wrong argument is not a desirable behavior.
If someone passes abs_time that is in the past, it would obviously indicate an issue somewhere up the calling stack and so punishing them with an eternal wait may be an OK thing to do.
However the larger problem is that this patch reverses the API semantics. It used to return immediately on a "bad" argument and now it hangs near-indefinitely. Any code that was dependent on the past behavior will be effectively broken now.
There's no mention of this in the pull notes, so chances are the change was unintentional. Hence this ticket.
Details
n/a
Additional information about Priority and Help Requested:
Are you willing to submit a pull request with a proposed change?
Yes
Requested priority: (Showstopper, High, Medium, Low)
Low (?)
Given typedef unsigned long DWORD, the old implementation:
DWORD rel_wait = (DWORD)(ms - now_ms);
return rel_wait < 0 ? 0 : rel_wait;
and the new one:
return (DWORD)(ms - now_ms);
are functionally equivalent, because rel_wait < 0 is never true since the overflow on unsigned subtraction was already ignored in the line above.
But this doesn't change the fact that both implementations don't handle the overflow correctly, and we can definitely fix that.
Thanks for pointing this out.
Ah, you are right, of course. Half way through describing the issue I somehow assumed that the original version was doing the right thing, whereby it was clearly not.
Windows is not supported any more.