quinn Completion-based I/O

Modern high-performance I/O APIs (e.g. io_uring and I/O completion ports) are completion-oriented, unlike the traditional readiness-oriented epoll paradigm. Advantages include fewer syscalls and no copying. On Windows in particular, readiness-oriented I/O is poorly supported.

@Matthias247 has reported that a prototype variant of Quinn modified to use registered I/O on windows performs significantly (~20%?) better than our Linux backend using sendmmsg and recvmmsg for efficient batching, and drastically better than the fallback backend on Windows. Due to major changes in tokio/mio's windows support, we should re-evaluate the latter result after moving to tokio 0.3.

On Linux, io_uring is only available on recent kernels. Other platforms (e.g. macOS, BSDs) may not offer completion-based I/O at all. Retaining a readiness-oriented fallback will therefore remain necessary for some time. However, the performance benefits seem to justify making completion-based I/O the first-class target.

A complicating factor is that tokio itself is, currently, 100% readiness-oriented. On Linux, we may be able to bridge this gap gracefully due to io_uring/epoll interop, but it's unclear if something similar is possible on Windows. If not, a background thread may be necessary.

Nov 13 '20 21:11 Ralith

I pushed my POC for RIO and completion based IO to https://github.com/quinn-rs/quinn/pull/918

This might help on getting an idea what is necessary. I think in a model where Quinn owns all IO and the associated thread it's not terribly complicated. The fact that datagrams are received and transmitted in an all-or-nothing fashion, and that it mostly has to deal with a single socket and not lots of them makes it easier than e.g. implementing support for completion oriented TCP. But obviously one still has to be a bit careful around ownership.

Nov 14 '20 21:11 Matthias247

I wonder if there are any plans about using io_uring? Have you considered integrating DataDog's glommio crate? Also nowadays it looks like another way to proceed might be using AF_XDP which might be simpler(?) in a way

May 26 '22 13:05 KirillLykov

See also discussion at https://github.com/quinn-rs/quinn/issues/1319. This issue was originally opened before GSO/GRO support were implemented and it's unclear if io_uring will be a benefit in net on Linux, though we'd be happy to mentor someone who wants to experiment.

May 27 '22 00:05 Ralith

See also discussion at #1319. This issue was originally opened before GSO/GRO support were implemented and it's unclear if io_uring will be a benefit in net on Linux, though we'd be happy to mentor someone who wants to experiment.

It sounds super interesting to check out, not sure if I will manage yet. I would start with creating a hack to plugin monoio and check the performance metrics. Hence, the first question is if there are any existing benchmarks that I can use to measure the performance. And the second -- where to start from?

May 27 '22 11:05 KirillLykov

See the bench and perf directories for some benchmarking tools. As discussed in #1319, tokio-uring might be a more appropriate place to start. You could look at @Matthias247's prototype above for inspiration, or try modifying the endpoint driver in quinn directly, or try building directly on top of quinn-proto to avoid being influenced by the existing architecture.

May 27 '22 16:05 Ralith

@sowhu do you have more context on the talk? Not sure which Stephen you're referring to.

Jun 27 '22 11:06 djc

Sorry guys. I just realized I posted on the wrong issue. Please just disregard my previous two comments. My mistake.

Jun 27 '22 14:06 sowhu

For the reference, @djc regarding io_uring library in rust: if the plots by monoio are still up-to-date and the benchmarks scenarios are fair, it looks like monoio is the fastest choice, see

Jun 27 '22 15:06 KirillLykov

Those benchmarks don't seem to involve tokio-uring, just regular epoll-based tokio.

Jun 27 '22 16:06 Ralith

Those benchmarks don't seem to involve tokio-uring, just regular epoll-based tokio.

I did some benchmarks with tokio-uring (which is single-threaded). It scores 370k/s, while a single-threaded monoio scores 270k/s. Monoio can scale to multiple threads, but so can tokio-uring (if you just spawn multiple executors!).

TL;DR: tokio-uring seems to be the way to go.

I would really like to use QUIC with io_uring! Thanks.

Feb 04 '23 14:02 Icelk

quinn quinn copied to clipboard

Completion-based I/O

quinn
quinn copied to clipboard