libuv icon indicating copy to clipboard operation
libuv copied to clipboard

linux: use io_uring for network i/o

Open bnoordhuis opened this issue 1 year ago • 7 comments

This is an invitation to brainstorm. :-)

There is currently a mismatch between io_uring's and libuv's I/O model that stops libuv from achieving maximal performance if it were to use io_uring for network i/o, particularly when it comes to receiving incoming connections and packets.

https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023 describes best practices that can be summarized as:

  1. keep multiple i/o requests enqueued (mismatch: libuv's "firehose" approach to incoming connections/data)

  2. let io_uring manage buffers (mismatch: libuv punts memory management to the user)

I've been thinking about what needs to change and this is what I came up with so far:

  1. add request-based apis for reading data and accepting connections, like int uv_stream_read(uv_read_t* req, uv_stream_t* handle, uv_buf_t* bufs, size_t nbufs, unsigned flags, uv_stream_read_cb cb) where uv_stream_read_cb is void (*)(uv_read_t* req, ssize_t nread, uv_buf_t* bufs)

  2. add a memory pool api that tells libuv it can allocate this much memory for buffers. Strawman proposal: uv_loop_configure(loop, UV_LOOP_SET_BUFFER_POOL_SIZE, 8<<20) with 0 meaning "pick a suitable default"

  3. introduce a flag that tells uv_stream_read() to ignore bufs and interpret nbufs as "bytes to read into buffer pool" (kind of ugly, suggestions welcome)

  4. introduce a new int uv_release_buffers(loop, bufs, nbufs) (maybe s/loop/handle/?) that tells libuv it's okay to reuse those buffer pool slices again. Alternatively: reclaim the memory automatically when uv_stream_read_cb returns but that means users may have to copy, or that they hold on longer to the buffers than is needed (inefficient if they queue up new requests in the callback)

Libuv-on-Windows is internally already request-based so should be relatively easy to adapt.

Suggestions for improvements very welcome!

bnoordhuis avatar Jun 10 '23 11:06 bnoordhuis

From a library user aspect:

  • Request-based API is much easier to be integrated than Firehose style. We used to add lots of hacks to get the correct accept loop because we can't stop inside connection callback(the read is stoppable so the problem is not so bad).

  • For buffer management, it became complicated, If the user callback deserializes the buffer into some struct, and the buffer is supposed to be reused, the user has to be careful don't point to the buffer (as you said: have to copy, or that they hold on longer to the buffers than is needed). Maybe adding buffer management API is not worth the effort.

winterland1989 avatar Jun 12 '23 04:06 winterland1989

I've been thinking about what needs to change and this is what I came up with so far:

  1. add request-based apis for reading data and accepting connections, like int uv_stream_read(uv_read_t* req, uv_stream_t* handle, uv_buf_t* bufs, size_t nbufs, unsigned flags, uv_stream_read_cb cb) where uv_stream_read_cb is void (*)(uv_read_t* req, ssize_t nread, uv_buf_t* bufs)

:+1: to this, I was bitten by no having this kind of api when I was experimenting with iouring in the past. Also, something that moves https://github.com/libuv/leps/pull/2 forward is a plus.

  1. add a memory pool api that tells libuv it can allocate this much memory for buffers. Strawman proposal: uv_loop_configure(loop, UV_LOOP_SET_BUFFER_POOL_SIZE, 8<<20) with 0 meaning "pick a suitable default"

How would this work exactly? IIUC, IORING_OP_PROVIDE_BUFFERS needs also to define the number of buffers and their size. Would we need to define at least one of those or would that be an implementation detail libuv takes care of? In the same vein, I see it's recommended in some cases to register multiple sets of buffers. Would also this be an implementation detail?

  1. introduce a flag that tells uv_stream_read() to ignore bufs and interpret nbufs as "bytes to read into buffer pool" (kind of ugly, suggestions welcome)

lgtm.

  1. introduce a new int uv_release_buffers(loop, bufs, nbufs) (maybe s/loop/handle/?) that tells libuv it's okay to reuse those buffer pool slices again. Alternatively: reclaim the memory automatically when uv_stream_read_cb returns but that means users may have to copy, or that they hold on longer to the buffers than is needed (inefficient if they queue up new requests in the callback)

Why the s/loop/handle/? I'm torn on this one, while I like the flexibility that having an API like that, it seems to me it would make it much more difficult for users to have it right. Could we have a flag UV_STREAM_NOT_RECLAIM_MEMORY or smthing like that, that would allow us to have both options available?

santigimeno avatar Jun 15 '23 11:06 santigimeno

Yes, it would be entirely up to libuv to decide how to chop up the allotment; I don't want too many io_uring eccentricities sneaking into the API because it also needs to work meaningfully on other platforms.

That said, I'm leaning towards shelving buffer management for now and just implement the simple thing first. The flags argument lets us add it later as an opt-in.

bnoordhuis avatar Jun 15 '23 15:06 bnoordhuis

Cool stuff. I just wanted to mention two other projects which are in the same area, abstracting IO over multiple platforms like Linux with io_uring, Windows IOCP and Macos. The first is libxev by Mitchell Hashimoto (https://github.com/mitchellh/libxev) which is a library for doing async IO, adopting the io_uring style (perhaps good to study for how it does work on other platforms?). The second is TigerBeetle which is a database but it has the IO fairly neatly in a directory (https://github.com/tigerbeetle/tigerbeetle/tree/main/src/io) so it's quite easy to read. Both are written in Zig, though libxev exposes a C API. Perhaps one can draw some inspiration from their work :)

plajjan avatar Sep 03 '23 20:09 plajjan

I opt for keeping the firehose approach incoming connections/data. io_uring has a handy feature called multshot(https://man.archlinux.org/man/io_uring.7.en#IORING_CQE_F_MORE), applicable to accept(43), read(0). Multshot can be depicted as one submission(SQE), multiple completeness(CQE). This feature closely matches the handle-based API in libuv.

ywave620 avatar Sep 22 '23 04:09 ywave620

If we switch reading from handle-based to request-based, the first dilemma comes to me is that how we submit read operation to io_uring, one by one (io_uring_enter(ring_fd, 1/*to_submit*/, ...)) or by batch(io_uring_enter(ring_fd, n/*to_submit*/, ...)) ? The first way means libuv will submit a read operation to io_uring every time the user issues a read request, which is good for latency but more syscalls are incurred. While the latter one means libuv will collect read requests and submit them all in a later time, which has opposite pros/cons.

ywave620 avatar Sep 22 '23 04:09 ywave620

If we switch reading from handle-based to request-based, the first dilemma comes to me is that how we submit read operation to io_uring, one by one (io_uring_enter(ring_fd, 1/*to_submit*/, ...)) or by batch(io_uring_enter(ring_fd, n/*to_submit*/, ...)) ? The first way means libuv will submit a read operation to io_uring every time the user issues a read request, which is good for latency but more syscalls are incurred. While the latter one means libuv will collect read requests and submit them all in a later time, which has opposite pros/cons.

Ignore this, pls. I just spot that libuv makes use of IORING_SETUP_SQPOLL, which makes us exempted from submiting operations(SQE)

ywave620 avatar Sep 22 '23 07:09 ywave620