async-channel Optimize and simplify notification logic

Sep 22 '20 12:09 kalcutter

The new logic looks correct to me, but I'm not sure it is more efficient than what we had before:

Suppose there are 100 receivers blocked on an empty channel. Each receiver is consuming messages in a loop.
Then we send 100 messages into the channel at once, thus waking up 100 receiver tasks. It's very likely that the first receiver to wake up will consume more than one or maybe even all of the 100 queued messages.
Finally, the remaining 99 receivers wake up only to discover that the channel is empty again and go back to sleep. With the old logic, only 1 receiver would wake up and go back to sleep, while the other 98 would just keep sleeping.

Sep 22 '20 13:09 ghost

I guess it depends on the use case and what you are measuring (e.g. if you care more about throughput or latency). However, at least in a general sense, I feel that the new logic does less work and could respond faster. Are there benchmarks available that we can use to compare relevant use cases? Otherwise I can write a few basic benchmarks to compare.

I am not sure how relevant your example is since if one receiver is so fast (in the normal case) it can process a large portion of the messages before the other tasks can even wake up, then there seems to be too many tasks processing messages? On the other hand, if messages continued to come in, the other tasks would already be awake and ready to receive new messages without having to wait on the notification "chain".

Sep 22 '20 14:09 kalcutter

I don't have any benchmarks for channel - it would be good to write some and see the impact. My hypothesis is that this change would consume more CPU due to unnecessary wakeups. The current logic is also used in async-executor for waking up executor threads when new tasks are scheduled (the task queue is basically a channel), and that's how almost all popular executors work.

I am not sure how relevant your example is since if one receiver is so fast (in the normal case) it can process a large portion of the messages before the other tasks can even wake up, then there seems to be too many tasks processing messages?

The main question here is how many messages can be processed by a single task/receiver. And the answer is that we can't really know. This PR assumes that 1 message corresponds to 1 task/receiver, but it is often the case that receivers process small messages in batches.

In case of executors, it can be common for 10 small scheduled tasks to be processed quickly by a single thread, so it's not worth waking up all threads. Instead, we wake up just 1 additional thread in case help is needed, and if that thread picks up a task, then it wakes up another and so on. This adaptive strategy hits a sweet spot between latency, throughput, and CPU usage.

Here's an interesting comment in the Go runtime, which describes the wakeup strategy in this PR:

// 3. Unpark an additional thread whenever we ready a goroutine and there is an
//    idle P, but don't do handoff. This would lead to excessive thread parking/
//    unparking as the additional threads will instantly park without discovering
//    any work to do.

source: https://github.com/golang/go/blob/d42b32e321fa5c5d2c93b2ad22d48e804c9f45d2/src/runtime/proc.go#L49-L52

Sep 22 '20 15:09 ghost

I think this can be closed now that PR #49 has been merged.

Sep 27 '22 19:09 sbarral

async-channel async-channel copied to clipboard

Optimize and simplify notification logic

async-channel
async-channel copied to clipboard