libcoro
libcoro copied to clipboard
Possible race condition
I believe there is a race condition in coro::ring_buffer between the await_ready and await_suspend methods. The lock is released in await_ready and re-acquired in await_suspend. When a reader and a writer are running on independent threads, this provides a window where say a reader's try_read_locked returns false, releases the lock, but before that read operation re-acquires the lock in the await_suspend method, a writer grabs the lock and adds data to the ring buffer, potentially putting the ring buffer in a full state.
I was able to deadlock the ring buffer with a capacity of 2 with one reader and one writer each running on distinct threads.
I have made modifications to your implementation which enables rescheduling the awaiting reader/writer on the same thread pool from which they performed the await read/write, but I don't think those changes are at play in this bug, but I have not ruled them out.
I'll try to submit a PR for my fix in your original code base.
Awesome, good find and I appreciate the explanation. I'll await the PR.
Sorry, I will get to this. I've been on paternity leave.
Hey no prob, I appreciate you jumping in, take your time. Congratulations 🎉
I think we just need to recheck if there is anything in await_suspend() since there is the two function calls? Unconditionally sleeping I think is the problem since something can be added to the ring buffer like you say between these two calls.
@ryanolson I've got a PR up that I think fixes this, working on adding a test to reproduce on the old code as well.
I don't think my test case is reproducing because the consumer task jumps pools, I'll need to explicitly use std::thread probably to reproduce.
@ryanolson I've managed to reproduce the test on the main branch, and it appears to be passing now, please take a look if you have sometime, thanks.