STL
STL copied to clipboard
<condition_variable>: std::condition_variable::wait_for() returns bogus std::cv_status::no_timeout status
Describe the bug
Port of DevCom-193041
int do_wait(_Cnd_t cond, _Mtx_t mtx, const xtime *target) in cond.c uses a heuristic _Xtime_diff_to_millis2 to figure out if a timeout occurred. It generates many false std::cv_status::no_timeout. Instead, the function should be using the result of the implementation's call to SleepConditionVariableCS to return the correct status. This flaw causes the user to falsely detect spurious wakeups by looking at the std::cv_status::no_timeout and the predicate. Where 'predicate' is whatever condition the user is trying to wakeup to.
Command-line test case
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <windows.h>
#include <thread>
int main() {
std::mutex m;
std::condition_variable c;
unsigned int bogusNoTimeoutCount = 0;
for (int i = 0; i < 200; ++i)
{
std::unique_lock<std::mutex> l{ m };
SetLastError(ERROR_SUCCESS);
// Note that we will only have proper timeouts or super rare spurious timeouts here.
auto s = c.wait_for(l, std::chrono::milliseconds{ 10 });
if (s == std::cv_status::no_timeout && GetLastError() == ERROR_TIMEOUT)
{
++bogusNoTimeoutCount;
}
}
std::cout << "bogusNoTimeoutCount: " << bogusNoTimeoutCount << std::endl;
return 0;
}
C:\temp>cl /MD /EHsc /W4 /WX .\repro.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29115 for x86
Copyright (C) Microsoft Corporation. All rights reserved.
repro.cpp
Microsoft (R) Incremental Linker Version 14.28.29115.0
Copyright (C) Microsoft Corporation. All rights reserved.
/out:repro.exe
repro.obj
C:\temp>.\repro.exe
bogusNoTimeoutCount: 1
Running it again:
C:\temp>.\repro.exe
bogusNoTimeoutCount: 0
Expected Behavior No bogus timeout returned
STL version Microsoft Visual Studio Enterprise 2019 Preview Version 16.8.0 Preview 1.0
Additional context This is a port of Developer Community Feedback Ticket: The Visual C++ 2017 condition_variable implementation of wait_for returns bogus no_timeout status (DevCom-193041) (Internal VSO-564728)
I debug and find that the problem may be here. Call stl_condition_variable_concrt _CONCRTIMP bool wait_for(::Concurrency::critical_section& _Lck, unsigned int _Timeout = COOPERATIVE_TIMEOUT_INFINITE); return true?
It looks like this is by design. According to the C++11 spec (ISO_IEC_14882_2012), in [thread.condition.condvar], we have the following:
cv_status wait_until(unique_lock<mutex>& lock,
const chrono::time_point<Clock, Duration>& abs_time);
Effects:
- Atomically calls lock.unlock() and blocks on *this.
- When unblocked, calls lock.lock() (possibly blocking on the lock), then returns.
- The function will unblock when signaled by a call to notify_one(), a call to notify_all(), expiration of the absolute timeout (30.2.4) specified by abs_time, or spuriously.
- If the function exits via an exception, lock.lock() shall be called prior to exiting the function scope.
Postcondition: lock.owns_lock() is true and lock.mutex() is locked by the calling thread. Returns: cv_status::timeout if the absolute timeout (30.2.4) specified by abs_time expired, otherwise cv_status::no_timeout.
/end quote Therefore, the implementor is free to return spuriously, and when it does so, it has to return no_timeout. The tranditional recommendation is to use the predicate overload and get the value of an atomic protected by the same mutex if a 1:1 signal:receipt is required.
@mscottmueller The thing here is that a real timeout is sometimes reported as spurious wakeup which is wrong. It is perfectly normal for the code logic to have different behavior in case a timeout occurred.
@TheStormN Even if a wake-up occurred truly spuriously, caller has to verify what time is it.
@svorobiev So the predicate should also check for timing, along with it's main purpose? condition_variable should do that check, not the end user.
@TheStormN you are still subject to race condition. Library checked, it wasnt a timeout. Higher priority process ran for an hour. Context switch, call returned. User code is dealing with a lie.
For others who don't want the library to check the extra check is a violation of the zero overhead principle.
@svorobiev Obviously std::cv_status::no_timeout is there for a reason and SleepConditionVariableCS is a syscall which knows without racing if the reason is spurious wakeup or not. The only issue here is that it's result is ignored by the MS STL implementation. There is not much to debate here, the fix is simple.
It is now 2023, is there a fix for this upcoming anytime soon? The patch itself seems quite simple as mentioned by other replies here.
There is no ETA for vNext, sorry.
Hi, Any ETa for this issue?
No ETA. We've actually found that we were able to fix some condition_variable etc. bugs previously thought unfixable while preserving bincompat, by injecting new functions into the STL's import lib. However, I haven't done an analysis to figure out whether that would be possible here; we are severely capacity-constrained for the near-mid future.