rr
rr copied to clipboard
Support `io_uring`
Here's a possible approach.
- On
io_uring_setup
, create a file monitor identifying the fd as an io_uring fd. - When that fd is mapped, remove any
MAP_FIXED
flag and set the prot flags to read/write and let the syscall proceed. This returns the address of the real uring buffer. Map a same-sized area of memory for the application's use (reapplyingMAP_FIXED
and with the right prot flags, if necessary) and return that address to the application. rr remembers the connection between the two buffers; when the fake uring buffer is unmapped, we also have to unmap the real buffer. - Before and after
io_uring_enter
, and, possibly at other times when we trap to rr, if there are submission queue entries in a fake buffer that haven't been copied to the real buffer, copy them, update the fake buffer head pointer, and record that change. Also remember any user-space memory ranges that the kernel may write to, associated with their queue entry. - Before and after
io_uring_enter
, and, possibly at other times when we trap to rr, If there are completion queue entries in the real buffer that haven't been copied to the fake buffer, copy and record them, and also record any associated user-space buffers.
This won't be very fast, since in many cases it will mean more io_ring_enter
syscalls than without rr, and all io_uring_enter
syscalls will require trapping to rr (i.e. 4 context switches), but if the submission queue is large then we will batch a lot of I/O operations per trap --- a bit like syscallbuf. (Trying to integrate io_uring with syscallbuf seems pointless since we get the batching effect as-is. If necessary we could make the real buffers bigger than the fake buffers.) So performance might be close to as good as one could expect.
This assumes application threads don't race with the kernel's writes to user-space I/O buffers. If we don't want to assume that, we can extend this to allocate additional scratch buffers, rewrite submission-queue entries to point to those buffers, and copy the contents of those buffers to the right place when we see new completion queue entries.