async-channel icon indicating copy to clipboard operation
async-channel copied to clipboard

Prevent deadlock if sender/receiver is forgotten

Open sbarral opened this issue 2 years ago • 1 comments

Fixes https://github.com/smol-rs/async-channel/issues/45.

This is mostly similar to PR https://github.com/smol-rs/async-channel/pull/16 by another author, but its motivation is correctness rather than the purported performance advantage. Differences with PR https://github.com/smol-rs/async-channel/pull/16:

  • it adds correctness tests for the case of forgotten sender and forgotten receiver,
  • the commit is based on 1.7.0,
  • the listener is not discarded when the future completes: in my understanding this would only help in the rare case where a sender/receiver completes due to a spurious wake-up (without consuming its notification); if my understanding is correct, I would say that the fixed cost of the extra conditional does not carry its weight.

I have made some benchmarking in various scenarios and did not observe a meaningful difference in terms of performance.

That being said, we can't exclude that performance could be degraded in extreme cases. In theory degraded performance could happen with a very large number of fast senders combined with a fast receiver loop: a receiver which rapidly empties the channel would generate many notifications but if the first sender consume all the capacity before the remaining senders are polled, most notifications would be wasted. A symmetric situation could occur with many receivers and a fast sender loop.

sbarral avatar Aug 09 '22 12:08 sbarral

Here are the results of the only benchmark that showed significant enough differences.

This benchmark is a beast so I can't really share it easily (I made it for my own purposes), but in a nutshell this is a "funnel" benchmark where each receiver is connected to 13 senders which each send a very small payload (a usize) in a tight loop, and the receiver does basically nothing except receive these payloads in a tight loop. The benchmark runs simultaneously 61 such channels to ensure that all executor threads are working at all time. The below figures were obtained with tokio using 4 threads.

Number of messages per microsecond as a function of channel capacity before the commit (higher is better):

        capacity=1        2.390 msg/µs
        capacity=10      13.094 msg/µs
        capacity=100     24.756 msg/µs
        capacity=1000    28.006 msg/µs
        capacity=10000   28.008 msg/µs

The same after the commit:

        capacity=1        2.342 msg/µs
        capacity=10       4.964 msg/µs
        capacity=100     30.280 msg/µs
        capacity=1000    30.839 msg/µs
        capacity=10000   33.092 msg/µs

So it's a mixed bag really, and overall one could even claim that the performance is better after, but in my opinion "funnel-type" benchmarks are a bit artificial anyway (it's unusual to have both a very fast receiver and very fast senders) so I would not put too much weight on these results.

sbarral avatar Aug 09 '22 13:08 sbarral

I hit the bug that this fixes. It drove me nuts for days and was tough to find. Please merge this!

unixpickle avatar Sep 24 '22 06:09 unixpickle

@sbarral Do you have any other benchmarks, e.g. basic SPSC or MPSC? I'd like to make sure this PR doesn't regress the common case before I merge it.

notgull avatar Sep 27 '22 14:09 notgull

@sbarral Do you have any other benchmarks, e.g. basic SPSC or MPSC? I'd like to make sure this PR doesn't regress the common case before I merge it.

The "funnel" benchmark mentioned above is what I would think of as the archetypical MPSC benchmark, and is indeed the one that shows that there is some trade-off.

Truth be told, I didn't really expect any difference in the SPSC case since in that case there can be only one registered listener on either side (1 sender/1 receiver). But just to be sure, I changed the parameters of the "funnel" benchmark to have just one sender, making it effectively an SPSC. So this tests spawns 61 sender/receiver pairs on tokio and measures the throughput.

Number of messages per microsecond as a function of channel capacity on v1.7.1, before the commit (higher is better):

capacity=1        4.257 msg/µs
capacity=10      15.226 msg/µs
capacity=100     22.409 msg/µs
capacity=1000    24.989 msg/µs
capacity=10000   24.930 msg/µs

The same after the commit:

capacity=1        4.353 msg/µs
capacity=10      18.416 msg/µs
capacity=100     30.980 msg/µs
capacity=1000    32.917 msg/µs
capacity=10000   32.954 msg/µs

So that's a nice surprise... My best guess is that this comes from the deletion of the additional notification which you inquired about. In the SPSC case this extra notification is never needed, but even when there is no interested listener it introduces a full fence (see here). This would explain why there is no difference for capacity=1 since this was special-cased before the commit to prevent the additional notification.

sbarral avatar Sep 27 '22 15:09 sbarral

I don't see any reason not to merge this, thanks!

notgull avatar Sep 27 '22 18:09 notgull

While there is still time to revert this commit, I wanted to highlight a caveat which I initially missed so that the decision to proceed with this PR (or not) is fully informed.

While I continue to believe that this PR fixes a real flaw, I realize now that the current behaviour was superior in one aspect: it warranted strict fairness (edit: not totally true, see below) by making sure that senders and receivers were unblocked in the same order they were blocked. After this PR, they are still notified in fair order, but nothing guarantees that they will get scheduled fairly. Unfortunately this strict fairness was also the direct cause of issue #45.

A nice idea is tokio's MPSC strategy, which I only became aware of recently. It sends all notifications right away like in this PR and does not strictly ensure fairness, but it will only let a sender push a value if this leaves enough free slots for senders that were notified earlier but not yet scheduled. This prevents deadlocks like in issue #45 and still provides fairness for all practical purposes.

Now I don't really see a way to implement tokio's strategy with the current architecture, so in practice we will probably have to choose between accepting easy deadlocks or resigning from strict fairness.

Edit: worth noting, however, that the behaviour before this PR was not actually strictly fair either: it would still let a non-notified sender/receiver steal a slot from a notified sender/receiver.

sbarral avatar Oct 28 '22 10:10 sbarral