STL icon indicating copy to clipboard operation
STL copied to clipboard

<condition_variable>: std::condition_variable::wait_for() returns bogus std::cv_status::no_timeout status

Open MahmoudGSaleh opened this issue 5 years ago • 11 comments

Describe the bug Port of DevCom-193041 int do_wait(_Cnd_t cond, _Mtx_t mtx, const xtime *target) in cond.c uses a heuristic _Xtime_diff_to_millis2 to figure out if a timeout occurred. It generates many false std::cv_status::no_timeout. Instead, the function should be using the result of the implementation's call to SleepConditionVariableCS to return the correct status. This flaw causes the user to falsely detect spurious wakeups by looking at the std::cv_status::no_timeout and the predicate. Where 'predicate' is whatever condition the user is trying to wakeup to.

Command-line test case

#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <windows.h>
#include <thread>

int main() {
    std::mutex              m;
    std::condition_variable c;

    unsigned int bogusNoTimeoutCount = 0;
    for (int i = 0; i < 200; ++i)
    {
        std::unique_lock<std::mutex> l{ m };
        SetLastError(ERROR_SUCCESS);
        // Note that we will only have proper timeouts or super rare spurious timeouts here.
        auto s = c.wait_for(l, std::chrono::milliseconds{ 10 });
        if (s == std::cv_status::no_timeout && GetLastError() == ERROR_TIMEOUT)
        {
            ++bogusNoTimeoutCount;
        }
    }

    std::cout << "bogusNoTimeoutCount: " << bogusNoTimeoutCount << std::endl;

    return 0;
}
C:\temp>cl /MD /EHsc /W4 /WX .\repro.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29115 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

repro.cpp
Microsoft (R) Incremental Linker Version 14.28.29115.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:repro.exe
repro.obj
C:\temp>.\repro.exe
bogusNoTimeoutCount: 1

Running it again:

C:\temp>.\repro.exe
bogusNoTimeoutCount: 0

Expected Behavior No bogus timeout returned

STL version Microsoft Visual Studio Enterprise 2019 Preview Version 16.8.0 Preview 1.0

Additional context This is a port of Developer Community Feedback Ticket: The Visual C++ 2017 condition_variable implementation of wait_for returns bogus no_timeout status (DevCom-193041) (Internal VSO-564728)

MahmoudGSaleh avatar Sep 01 '20 23:09 MahmoudGSaleh

I debug and find that the problem may be here. Call stl_condition_variable_concrt _CONCRTIMP bool wait_for(::Concurrency::critical_section& _Lck, unsigned int _Timeout = COOPERATIVE_TIMEOUT_INFINITE); return true?

butterflyy avatar Sep 09 '20 13:09 butterflyy

It looks like this is by design. According to the C++11 spec (ISO_IEC_14882_2012), in [thread.condition.condvar], we have the following:

cv_status wait_until(unique_lock<mutex>& lock,
const chrono::time_point<Clock, Duration>& abs_time);

Effects:

  • Atomically calls lock.unlock() and blocks on *this.
  • When unblocked, calls lock.lock() (possibly blocking on the lock), then returns.
  • The function will unblock when signaled by a call to notify_one(), a call to notify_all(), expiration of the absolute timeout (30.2.4) specified by abs_time, or spuriously.
  • If the function exits via an exception, lock.lock() shall be called prior to exiting the function scope.

Postcondition: lock.owns_lock() is true and lock.mutex() is locked by the calling thread. Returns: cv_status::timeout if the absolute timeout (30.2.4) specified by abs_time expired, otherwise cv_status::no_timeout.

/end quote Therefore, the implementor is free to return spuriously, and when it does so, it has to return no_timeout. The tranditional recommendation is to use the predicate overload and get the value of an atomic protected by the same mutex if a 1:1 signal:receipt is required.

mscottmueller avatar Sep 14 '20 19:09 mscottmueller

@mscottmueller The thing here is that a real timeout is sometimes reported as spurious wakeup which is wrong. It is perfectly normal for the code logic to have different behavior in case a timeout occurred.

TheStormN avatar Sep 18 '20 09:09 TheStormN

@TheStormN Even if a wake-up occurred truly spuriously, caller has to verify what time is it.

svorobiev avatar Oct 17 '20 20:10 svorobiev

@svorobiev So the predicate should also check for timing, along with it's main purpose? condition_variable should do that check, not the end user.

TheStormN avatar Oct 18 '20 12:10 TheStormN

@TheStormN you are still subject to race condition. Library checked, it wasnt a timeout. Higher priority process ran for an hour. Context switch, call returned. User code is dealing with a lie.

For others who don't want the library to check the extra check is a violation of the zero overhead principle.

svorobiev avatar Oct 20 '20 06:10 svorobiev

@svorobiev Obviously std::cv_status::no_timeout is there for a reason and SleepConditionVariableCS is a syscall which knows without racing if the reason is spurious wakeup or not. The only issue here is that it's result is ignored by the MS STL implementation. There is not much to debate here, the fix is simple.

TheStormN avatar Oct 20 '20 11:10 TheStormN

It is now 2023, is there a fix for this upcoming anytime soon? The patch itself seems quite simple as mentioned by other replies here.

RedFox20 avatar Apr 25 '23 20:04 RedFox20

There is no ETA for vNext, sorry.

StephanTLavavej avatar Apr 25 '23 21:04 StephanTLavavej

Hi, Any ETa for this issue?

agustinbcu01 avatar Apr 16 '24 17:04 agustinbcu01

No ETA. We've actually found that we were able to fix some condition_variable etc. bugs previously thought unfixable while preserving bincompat, by injecting new functions into the STL's import lib. However, I haven't done an analysis to figure out whether that would be possible here; we are severely capacity-constrained for the near-mid future.

StephanTLavavej avatar Apr 17 '24 23:04 StephanTLavavej