SPSCQueue Assertion failed

Assertion failed

Open armintoepfer opened this issue 2 years ago • 12 comments

Hi!

I've been using SPSCQueue for some time and just updated to the latest version and now hit this:

SPSCQueue.hpp:188: void Rigtorp::SPSCQueue<T, Allocator>::pop() [with T = PacBio::CCS::Chunk; Allocator = std::allocator<PacBio::CCS::Chunk>]: Assertion `writeIdx_.load(std::memory_order_acquire) != readIdx' failed.

Any advice? Thank you!

Mar 31 '22 12:03 armintoepfer

To reproduce:

#include <complex>
#include <array>
#include "SPSCQueue.h"
int main() {
    rigtorp::SPSCQueue< int > queue(16);
    for (int j = 0; j < 6; ++j) {
        queue.emplace(0);
    }
    for (int j = 0; j < 3; ++j) {
        queue.pop();
    }
}

If I dump indexes after each operation this way

template<typename T>
void dump(T& q) {
    std::cout << q.writeIdx_ << " " << q.writeIdxCache_ << " " << q.readIdx_ << " " << q.readIdxCache_ << std::endl;
}

It is clear this is caused by the diverged non-atomic caches and the real values:

1 0 0 0 # push
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
6 0 1 0 # pop
6 0 2 0
6 0 3 0
6 0 3 0

If ensure the equality between the caches and the indexes in the end of the pop call, the problem disappears. Sure it's just a dirty fix because I wouldn't like to dig into the logic.

    readIdxCache_ = nextReadIdx;
    writeIdxCache_ = writeIdx_;

Sep 14 '22 17:09 galchinsky

@galchinsky On which CPU are you running? For example, Apple Silicon CPU behaves a bit differently than Intel, as I could experiment (SPSCQueue was not involved).

Sep 14 '22 17:09 Philippe91

@Philippe91 it's a simple single-threaded example with a few push and a few pop operations, I don't think CPU can make a difference here.

Sep 14 '22 17:09 galchinsky

I submitted a PR to fix this, see PR "Bug fix for #35".

Jan 17 '23 21:01 usefulcat

I submitted a PR to fix this, see PR "Bug fix for #35".

This solves the problem, apparently.

This being said, the logic behind the cached indexes system introduced by @rigtorp is challenging to grasp in my mind. It gives me an insecure feeling, reinforced by the presence of this bug, introduced 18 months ago, and to which the author has never reacted since it was reported ten months ago.

I am considering reverting to the implementation prior 3a0507a193b0573d87823759d5c8c1230598bfbc.

Jan 18 '23 16:01 Philippe91

I just moved to Facebook's queue, because who knows are there any other bugs like that

Jan 18 '23 16:01 galchinsky

I have the same problem.

#include <iostream>
#include "SPSCQueue.h"
using namespace rigtorp;
int main(int argc, char* argv[]) {
    SPSCQueue<int> q(10000);
    
    q.push(1);
    q.pop();
    
    return 0;
}

Output: Assertion failed: writeIdx_.load(std::memory_order_acquire) != readIdx, file D:\Programs\Test\test1\SPSCQueue.h, line 184

If add q.front(); before q.pop();, the code will run properly.

Apr 05 '23 08:04 jz315

And function front() has some bug.

RIGTORP_NODISCARD T* front() noexcept {
            auto const readIdx = readIdx_.load(std::memory_order_relaxed);
            if (readIdx == writeIdxCache_) {
                writeIdxCache_ = writeIdx_.load(std::memory_order_acquire);
                if (writeIdxCache_ == readIdx) {
                    return nullptr;
                }
            }
            return &slots_[readIdx + kPadding];
        }

If you run the pop() function first, readIdx_ will change from 0 to 1, while the initial value of writeIdxCache_ is 0, so they will be unequal and writeIdxCache_ will not update until readIdx_ reaches the end of the array and becomes 0 again. During this time, the front() function will always return an object regardless of whether the queue is empty.

Apr 05 '23 08:04 jz315

It's just an assertion fail in the class destructor because writeIdxCache_ value doesn't reflect writeIdx_ as it is only updated in the *front() member function when writeIdxCache_ == readIdx_.load(std::memory_order_relaxed) and only if it is so writeIdxCache_ gets updated, hence the fault. The sane fix is to check if size() > 0 in the class destructor loop.

Because you weren't supposed to call *front() when the size() is zero and that's what the class destructor does, it failed to check bounds as it should be.

May 04 '23 18:05 vetribox

There seems to be a misunderstanding of the pre-conditions of calling pop(). I updated the documentation here: https://github.com/rigtorp/SPSCQueue/commit/40661138be2245d7d52854e4fb3b7f6bb4bf02ad

This is a concurrent queue intended to be used by two threads. In that case the only way for the reader thread to know if there is more data to read is by calling front(). Only once it returns a non-nullptr can you call pop().

This code invokes undefined behavior (and triggers the assert):

SPSCQueue<int> q(10000);
q.push(1);
q.pop();

This code correctly checks that the queue is not empty:

SPSCQueue<int> q(10000);
q.push(1);
if (q.front()) {
   q.pop();
}

Since you are not reading front(), you are not communicating any data to the other thread, so I don't understand why you even need the queue?

Sep 25 '23 03:09 rigtorp

I don't remember the details (even where the assertion got triggered), but here are 3 chunks of code I had before moving to the facebook's queue, and checking for front after checking for emptiness or size doesn't seem obvious to me.

            while (!queue.empty()) {
                lapper.put_sample(*queue.front());
                queue.pop();
            }

another one

           while (queue.size() > 4) {
                queue.pop();
            }

and

            while (queue.front()) {
                DBOUT("$" << queue.writeIdx_ << " " << queue.writeIdxCache_ << " " << queue.readIdx_ << " " << queue.readIdxCache_);
                queue.pop();
            }

Apparently, given there are debug outputs, it was the last one that triggered the assertion. Pushing was being done this way, in a separate thread:

               bool success = queue.try_push(workbuf);
                if (success) {
                    last_timestamp = buf.timestamp;
                }

Sep 25 '23 10:09 galchinsky

Yes, those first two are invalid and will break the queue invariants.

I really should never have used the front() and pop() APi from std::queue. Instead they should be a single transaction try_pop() that can return a value. Or a zero-copy version that visits the value in-place using a lambda.

On Mon, Sep 25, 2023 at 5:43 AM Dmitry Galchinsky @.***> wrote:

I don't remember the details (even where the assertion got triggered), but here are 3 chunks of code I had before moving to the facebook's queue, and checking for front after checking for emptiness or size doesn't seem obvious to me.
        while (!queue.empty()) {
            lapper.put_sample(*queue.front());
            queue.pop();
        }
another one
       while (queue.size() > 4) {
            queue.pop();
        }
and
        while (queue.front()) {
            DBOUT("$" << queue.writeIdx_ << " " << queue.writeIdxCache_ << " " << queue.readIdx_ << " " << queue.readIdxCache_);
            queue.pop();
        }
Apparently, given there are debug outputs, it was the last one that triggered the assertion. Pushing was being done this way, in a separate thread:
           bool success = queue.try_push(workbuf);
            if (success) {
                last_timestamp = buf.timestamp;
            }
— Reply to this email directly, view it on GitHub https://github.com/rigtorp/SPSCQueue/issues/35#issuecomment-1733414452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABLO26TGO4YASZ6VM3UXY3X4FN55ANCNFSM5SFF52EQ . You are receiving this because you were mentioned.Message ID: @.***>

Sep 26 '23 00:09 rigtorp

SPSCQueue SPSCQueue copied to clipboard

Assertion failed

SPSCQueue
SPSCQueue copied to clipboard