concurrentqueue icon indicating copy to clipboard operation
concurrentqueue copied to clipboard

Potential memory order inconsistency on freelist block reusage

Open ayles opened this issue 1 year ago • 9 comments

Hi! So I noticed that TSAN complains about races inside concurrentqueue. I was able to reproduce it with the following minimal example:

#include "concurrentqueue.h"

#include <thread>
#include <vector>

int main() {
    moodycamel::ConcurrentQueue<int> q;

    std::vector<std::thread> threads;

    for (size_t i = 0; i < 5; ++i) {
        threads.emplace_back([&q]() {
            while (true) {
                int item;
                q.try_dequeue(item);
            }
        });
    }

    for (size_t i = 0; i < 10; ++i) {
        threads.emplace_back([&q]() {
            while (true) {
                q.try_enqueue(1);
            }
        });
    }

    for (auto& t : threads) {
        t.join();
    }
}
...

WARNING: ThreadSanitizer: data race (pid=1709853)
  Write of size 4 at 0x728c00000718 by thread T14:
    #0 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::enqueue<(moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)1, int>(int&&) ... concurrentqueue.h:2544:4
    #1 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::inner_enqueue<(moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)1, int>(int&&) ... concurrentqueue.h:1376:94
    #2 moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::try_enqueue(int&&) ... concurrentqueue.h:1074:15
    #3 main::$_1::operator()() const ... main.cpp:23:19
    
...

  Previous read of size 4 at 0x728c00000718 by thread T5:
    #0 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::dequeue<int>(int&) ... concurrentqueue.h:2596:17
    #1 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ProducerBase::dequeue<int>(int&) ... concurrentqueue.h:1720:50
    #2 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::try_dequeue<int>(int&) ... concurrentqueue.h:1146:32
    #3 main::$_0::operator()() const ... main.cpp:15:19

...

I believe this race (in terms of C++ memory model) happens like this: T0: Thread A reads from memory slot S. T1: Thread A releases memory slot S of block X. T2: Thread B releases last slot of block X, pushes block into the freelist. T3: Thread C acquires block X from the freelist, writes to slot S.

I guess release in this and similar places should be acq_rel (or release + acquire fence on the last increment): https://github.com/cameron314/concurrentqueue/blob/master/concurrentqueue.h#L1607 to build happens-before relation between the slot release (thread A) and the whole block release (thread B) and subsequent push to the freelist. (Tested it, TSAN stopped complaining).

I am not so good with C++ memory model, so maybe I am missing something here?

ayles avatar Dec 17 '24 16:12 ayles

@cameron314 This issue looks quite serious. To address it, I've submitted https://github.com/cameron314/concurrentqueue/pull/406, which is based on this commit. When you have a moment, could you please review the changes? Thank you for your time and consideration.

hs-vc avatar Jan 22 '25 08:01 hs-vc

I can confirm this TSan report indicates it thinks the block was not handed off cleanly in the FreeList between the two threads. (Line 1607 is irrelevant, as it's in code that's never executed with the default block size.)

It is not clear to me whether or not this is a false positive. More analysis is needed. At the root of the free list implementation are CAS operations on freeListHead, which is already done with release semantics when a block is put on the freelist and acquire semantics when a block is taken off. Therefore my first impression is that this is likely a false positive.

If there is a bug, blindly changing memory ordering until TSan stops complaining does not necessarily indicate the bug is fixed. Actual human logic needs to be involved.

cameron314 avatar Jan 26 '25 20:01 cameron314

Think I found one case where the freeListRefs may get temporarily manipulated by another thread than the one in add_knowing_refcount_is_zero, leading to a break in happens-before ordering with respect to that member (note that this should not affect the happens-before of putting a block on the list head and getting it back). Pushed e74b07d5878a10689f1999d0c3ab1aae4127f881.

cameron314 avatar Jan 26 '25 21:01 cameron314

Sorry, but I do not understand how line 1607 is irrelevant if it is executed (on my machine, with the example above).

And, yes, if thead B releases the block (after releasing the last element) and thread C acquires that block, there is definitely a clear happens-before relation between them.

However, when thread A releases a non-last element from that block, it does not interact with freeListRefs/freeListHead in any way. And the only place where threads A and B can establish happens-before relation is when they increment elementsCompletelyDequeued.

In fact, I do not believe in false positives with modern TSAN, unless they are extremly rare or thread fences are in use (ofc when talking about C++ memory model and not about actual races on a particular architecture).

ayles avatar Jan 27 '25 00:01 ayles

Forgot to mention that, as expected, https://github.com/cameron314/concurrentqueue/commit/e74b07d5878a10689f1999d0c3ab1aae4127f881 does not fix the problem.

ayles avatar Jan 27 '25 00:01 ayles

Ah, I misread your initial analysis. I see what you mean now. It's not about the thread freeing the block but about the other dequeue operations on that block before that. Thank you, this is helpful.

It seems I also reread the code too quickly. Line 1607 is skipped with the default block size for explicit producers, but not implicit ones.

Yes, the commit I pushed is for an unrelated potential issue.

cameron314 avatar Jan 27 '25 02:01 cameron314

I don't have TSan set up at the moment, please let me know if the latest commit fixes this particular warning.

Based on #406, I'm guessing there are others that are unrelated to this one.

cameron314 avatar Jan 27 '25 04:01 cameron314

I can't test this commit at the moment, but as I mentioned earlier, one change on line 1607 completely resolves the warnings in the example.

I suppose there might be a similar issue with flags in the if-branch as well.

Maybe it will be possible to introduce TSAN-tests in the future, starting with the example above?

ayles avatar Jan 27 '25 07:01 ayles

Also, in terms of performance, it may be better to use release memory order with acquire-fence on the last increment, as described here https://www.boost.org/doc/libs/1_53_0/doc/html/atomic/usage_examples.html#boost_atomic.usage_examples.example_reference_counters.discussion With a call to __tsan_acquire in TSAN-builds, ofc. (But the new conditional branch may cost a little, so it will require some benchmarking)

ayles avatar Jan 27 '25 07:01 ayles