libcoro icon indicating copy to clipboard operation
libcoro copied to clipboard

Possible race condition

Open ryanolson opened this issue 3 years ago • 1 comments

I believe there is a race condition in coro::ring_buffer between the await_ready and await_suspend methods. The lock is released in await_ready and re-acquired in await_suspend. When a reader and a writer are running on independent threads, this provides a window where say a reader's try_read_locked returns false, releases the lock, but before that read operation re-acquires the lock in the await_suspend method, a writer grabs the lock and adds data to the ring buffer, potentially putting the ring buffer in a full state.

I was able to deadlock the ring buffer with a capacity of 2 with one reader and one writer each running on distinct threads.

I have made modifications to your implementation which enables rescheduling the awaiting reader/writer on the same thread pool from which they performed the await read/write, but I don't think those changes are at play in this bug, but I have not ruled them out.

I'll try to submit a PR for my fix in your original code base.

ryanolson avatar Sep 27 '22 20:09 ryanolson

Awesome, good find and I appreciate the explanation. I'll await the PR.

jbaldwin avatar Sep 28 '22 23:09 jbaldwin

Sorry, I will get to this. I've been on paternity leave.

ryanolson avatar Dec 05 '22 05:12 ryanolson

Hey no prob, I appreciate you jumping in, take your time. Congratulations 🎉

jbaldwin avatar Dec 05 '22 23:12 jbaldwin

I think we just need to recheck if there is anything in await_suspend() since there is the two function calls? Unconditionally sleeping I think is the problem since something can be added to the ring buffer like you say between these two calls.

jbaldwin avatar Feb 25 '23 17:02 jbaldwin

@ryanolson I've got a PR up that I think fixes this, working on adding a test to reproduce on the old code as well.

jbaldwin avatar Feb 25 '23 22:02 jbaldwin

I don't think my test case is reproducing because the consumer task jumps pools, I'll need to explicitly use std::thread probably to reproduce.

jbaldwin avatar Feb 25 '23 23:02 jbaldwin

@ryanolson I've managed to reproduce the test on the main branch, and it appears to be passing now, please take a look if you have sometime, thanks.

jbaldwin avatar Feb 26 '23 14:02 jbaldwin