crossbeam icon indicating copy to clipboard operation
crossbeam copied to clipboard

Fallback to Parking in Bounded and Unbounded Channels

Open ibraheemdev opened this issue 1 year ago • 6 comments

Currently, the queues used by the bounded and unbounded channels (also ArrayQueue and SegQueue) are not lock-free. The linearization point of a send or receive is moving the tail or head index respectively, meaning that if a sender or receiver sees the head/tail has moved but the value has not been written yet, they must spin until the corresponding receive or send completes that frees up space in the channel. Blocking in such a case is not possible, as sends might notify a receiver even though they are not visible due to a lagging sender holding up the channel, or vice versa, leading to missed wakeups.

The current solution of unbounded spinning can lead to issues, especially on platforms with a single core or custom thread priorities. One way to allow falling back to blocking is if waiters that see more values in the channel after being woken up notify any other waiters, making up for any missed notifications. However, this ends up being quite expensive and can lead to many unnecessary wakeups. A solution to this is wakeup throttling as implemented by tokio and other userspace schedulers, which avoids notifications if there is already an active thread, the "waker thread". That thread will then notify another waiter if it finds a message, creating a new waker thread. This PR changes the bounded and unbounded channels to fallback to blocking after observing a linearizability condition, using wakeup throttling to makeup for missed notifications without creating too many unnecessary wakeups.

Thanks to @kprotty for pointing me in the right direction for the wakeup throttling algorithm. Hopefully paired with https://github.com/crossbeam-rs/crossbeam/pull/1038 this should resolve all of the spinning issues in crossbeam/std.

Should resolve https://github.com/crossbeam-rs/crossbeam/issues/366 and https://github.com/crossbeam-rs/crossbeam/issues/997 (as well as https://github.com/rust-lang/rust/issues/114851 and https://github.com/rust-lang/rust/issues/112723 when upstreamed to std).

ibraheemdev avatar May 02 '24 01:05 ibraheemdev

We probably also want to introduce a non-linearizable version of try_recv, because this doesn't actually fix the issue of try_recv taking longer than expected.

ibraheemdev avatar May 02 '24 04:05 ibraheemdev

Here are the benchmark results before and after. Note that all of the fallback cases are very unlikely to be hit on most setups, so this is just measuring the performance difference of new wakeup algorithm, which seems to be pretty mixed.

- test bounded_1::create    ... bench:          91 ns/iter (+/- 0)
+ test bounded_1::create    ... bench:          82 ns/iter (+/- 3)
- test bounded_1::mpmc      ... bench:  10,576,639 ns/iter (+/- 1,012,158)
+ test bounded_1::mpmc      ... bench:  11,206,415 ns/iter (+/- 2,787,440)
- test bounded_1::mpsc      ... bench:  21,551,625 ns/iter (+/- 975,344)
+ test bounded_1::mpsc      ... bench:  21,975,220 ns/iter (+/- 390,863)
- test bounded_1::oneshot   ... bench:         116 ns/iter (+/- 1)
+ test bounded_1::oneshot   ... bench:         104 ns/iter (+/- 3)
- test bounded_1::spmc      ... bench:  22,192,680 ns/iter (+/- 760,426)
+ test bounded_1::spmc      ... bench:  22,268,657 ns/iter (+/- 559,440)
- test bounded_1::spsc      ... bench:  23,097,047 ns/iter (+/- 76,889)
+ test bounded_1::spsc      ... bench:  22,753,449 ns/iter (+/- 143,048)
- test bounded_n::mpmc      ... bench:   2,387,457 ns/iter (+/- 313,821)
+ test bounded_n::mpmc      ... bench:   2,734,135 ns/iter (+/- 315,206)
- test bounded_n::mpsc      ... bench:   5,909,059 ns/iter (+/- 81,411)
+ test bounded_n::mpsc      ... bench:   6,635,209 ns/iter (+/- 134,351)
- test bounded_n::par_inout ... bench:   6,835,077 ns/iter (+/- 234,552)
+ test bounded_n::par_inout ... bench:   6,583,975 ns/iter (+/- 210,953)
- test bounded_n::spmc      ... bench:   5,649,130 ns/iter (+/- 127,809)
+ test bounded_n::spmc      ... bench:   5,748,161 ns/iter (+/- 130,954)
- test bounded_n::spsc      ... bench:   1,126,902 ns/iter (+/- 32,449)
+ test bounded_n::spsc      ... bench:   1,194,757 ns/iter (+/- 15,458)
- test unbounded::create    ... bench:          64 ns/iter (+/- 1)
+ test unbounded::create    ... bench:          57 ns/iter (+/- 1)
- test unbounded::inout     ... bench:          36 ns/iter (+/- 0)
+ test unbounded::inout     ... bench:          44 ns/iter (+/- 0)
- test unbounded::mpmc      ... bench:   1,181,564 ns/iter (+/- 204,692)
+ test unbounded::mpmc      ... bench:   1,200,760 ns/iter (+/- 174,606)
- test unbounded::mpsc      ... bench:   2,900,684 ns/iter (+/- 196,933)
+ test unbounded::mpsc      ... bench:   2,144,163 ns/iter (+/- 199,247)
- test unbounded::oneshot   ... bench:         103 ns/iter (+/- 0)
+ test unbounded::oneshot   ... bench:         113 ns/iter (+/- 0)
- test unbounded::par_inout ... bench:   4,799,151 ns/iter (+/- 487,975)
+ test unbounded::par_inout ... bench:   6,485,203 ns/iter (+/- 407,363)
- test unbounded::spmc      ... bench:   2,366,594 ns/iter (+/- 46,477)
+ test unbounded::spmc      ... bench:   2,014,760 ns/iter (+/- 28,476)
- test unbounded::spsc      ... bench:   1,061,444 ns/iter (+/- 14,589)
+ test unbounded::spsc      ... bench:   1,392,352 ns/iter (+/- 11,792)

ibraheemdev avatar May 02 '24 05:05 ibraheemdev

There are some changes I made on the rust-lang PR that have to be applied to this one as well.

I believe there's a large optimization here than can be made which is only notifying a waiter if the current thread sees remaining data on the channel.

ibraheemdev avatar Jul 11 '24 02:07 ibraheemdev