tokio-uring icon indicating copy to clipboard operation
tokio-uring copied to clipboard

Bug: Race condition in next_when_notified

Open ollie-etl opened this issue 1 year ago • 1 comments

I believe the following fixes a race condition in FixedBufPool. We were observing a situation where, without resorting to timeouts to force retries, multiple concurrent calls to next would eventually all stall.

The fix (setting enable) is take directly from https://docs.rs/tokio/latest/tokio/sync/futures/struct.Notified.html#method.enable

The call to enable is important because otherwise if you have two calls to recv and two calls to send in parallel, the following could happen:

  1. Both calls to try_recv return None.
  2. Both new elements are added to the vector.
  3. The notify_one method is called twice, adding only a single permit to the Notify.
  4. Both calls to recv reach the Notified future. One of them consumes the permit, and the other sleeps forever.

By adding the Notified futures to the list by calling enable before try_recv, the notify_one calls in step three would remove the futures from the list and mark them notified instead of adding a permit to the Notify. This ensures that both futures are woken.

I believe this is happening in the presence of only concurrency, not parallelism (single threaded)

ollie-etl avatar Sep 14 '23 10:09 ollie-etl

@mzabaluev you're probably best placed for an opinion

ollie-etl avatar Sep 14 '23 11:09 ollie-etl