quinn
quinn copied to clipboard
Completion-based I/O
Modern high-performance I/O APIs (e.g. io_uring and I/O completion ports) are completion-oriented, unlike the traditional readiness-oriented epoll
paradigm. Advantages include fewer syscalls and no copying. On Windows in particular, readiness-oriented I/O is poorly supported.
@Matthias247 has reported that a prototype variant of Quinn modified to use registered I/O on windows performs significantly (~20%?) better than our Linux backend using sendmmsg
and recvmmsg
for efficient batching, and drastically better than the fallback backend on Windows. Due to major changes in tokio/mio's windows support, we should re-evaluate the latter result after moving to tokio 0.3.
On Linux, io_uring is only available on recent kernels. Other platforms (e.g. macOS, BSDs) may not offer completion-based I/O at all. Retaining a readiness-oriented fallback will therefore remain necessary for some time. However, the performance benefits seem to justify making completion-based I/O the first-class target.
A complicating factor is that tokio itself is, currently, 100% readiness-oriented. On Linux, we may be able to bridge this gap gracefully due to io_uring/epoll interop, but it's unclear if something similar is possible on Windows. If not, a background thread may be necessary.
I pushed my POC for RIO and completion based IO to https://github.com/quinn-rs/quinn/pull/918
This might help on getting an idea what is necessary. I think in a model where Quinn owns all IO and the associated thread it's not terribly complicated. The fact that datagrams are received and transmitted in an all-or-nothing fashion, and that it mostly has to deal with a single socket and not lots of them makes it easier than e.g. implementing support for completion oriented TCP. But obviously one still has to be a bit careful around ownership.
I wonder if there are any plans about using io_uring
? Have you considered integrating DataDog's glommio crate? Also nowadays it looks like another way to proceed might be using AF_XDP which might be simpler(?) in a way
See also discussion at https://github.com/quinn-rs/quinn/issues/1319. This issue was originally opened before GSO/GRO support were implemented and it's unclear if io_uring will be a benefit in net on Linux, though we'd be happy to mentor someone who wants to experiment.
See also discussion at #1319. This issue was originally opened before GSO/GRO support were implemented and it's unclear if io_uring will be a benefit in net on Linux, though we'd be happy to mentor someone who wants to experiment.
It sounds super interesting to check out, not sure if I will manage yet. I would start with creating a hack to plugin monoio and check the performance metrics. Hence, the first question is if there are any existing benchmarks that I can use to measure the performance. And the second -- where to start from?
See the bench
and perf
directories for some benchmarking tools. As discussed in #1319, tokio-uring might be a more appropriate place to start. You could look at @Matthias247's prototype above for inspiration, or try modifying the endpoint driver in quinn
directly, or try building directly on top of quinn-proto to avoid being influenced by the existing architecture.
@sowhu do you have more context on the talk? Not sure which Stephen you're referring to.
Sorry guys. I just realized I posted on the wrong issue. Please just disregard my previous two comments. My mistake.
For the reference, @djc regarding io_uring library in rust: if the plots by monoio are still up-to-date and the benchmarks scenarios are fair, it looks like monoio is the fastest choice, see
Those benchmarks don't seem to involve tokio-uring, just regular epoll-based tokio.
Those benchmarks don't seem to involve tokio-uring, just regular epoll-based tokio.
I did some benchmarks with tokio-uring (which is single-threaded). It scores 370k/s, while a single-threaded monoio scores 270k/s. Monoio can scale to multiple threads, but so can tokio-uring (if you just spawn multiple executors!).
TL;DR: tokio-uring
seems to be the way to go.
I would really like to use QUIC with io_uring! Thanks.