async-executor
async-executor copied to clipboard
Added local queue scheduling and "next_task" optimization
Two major changes significantly improve performance:
- When
Executor::run()is called, a handle to the local queue and ticker are cached into TLS. This lets tasks schedule to a thread-local queue rather than always to the global queue. - Within the local queue, we implement a
next_taskoptimization (see https://tokio.rs/blog/2019-10-scheduler) to greatly reduce context-switch costs in message-passing patterns. We avoid putting the same task intonext_tasktwice to avoid starvation.
Through both unit testing and production deployment in https://github.com/geph-official/geph4, whose QUIC-like sosistab protocol is structured in an actor-like fashion that greatly stresses the scheduler, I see significant improvements in real-world throughput (up to 30%, and this is in a server dominated by cryptography CPU usage) and massive improvements in microbenchmarks (up to 10x faster in the yield_now benchmark and similar context-switch benchmarks). I see no downsides --- the code should gracefully fall back to pushing to the global queeu in case e.g. nesting Executors invalidates the TLS cache.
I also added criterion benchmarks.