rr
rr copied to clipboard
Support `io_uring`
Here's a possible approach.
- On
io_uring_setup
, create a file monitor identifying the fd as an io_uring fd. - When that fd is mapped, remove any
MAP_FIXED
flag and set the prot flags to read/write and let the syscall proceed. This returns the address of the real uring buffer. Map a same-sized area of memory for the application's use (reapplyingMAP_FIXED
and with the right prot flags, if necessary) and return that address to the application. rr remembers the connection between the two buffers; when the fake uring buffer is unmapped, we also have to unmap the real buffer. - Before and after
io_uring_enter
, and, possibly at other times when we trap to rr, if there are submission queue entries in a fake buffer that haven't been copied to the real buffer, copy them, update the fake buffer head pointer, and record that change. Also remember any user-space memory ranges that the kernel may write to, associated with their queue entry. - Before and after
io_uring_enter
, and, possibly at other times when we trap to rr, If there are completion queue entries in the real buffer that haven't been copied to the fake buffer, copy and record them, and also record any associated user-space buffers.
This won't be very fast, since in many cases it will mean more io_ring_enter
syscalls than without rr, and all io_uring_enter
syscalls will require trapping to rr (i.e. 4 context switches), but if the submission queue is large then we will batch a lot of I/O operations per trap --- a bit like syscallbuf. (Trying to integrate io_uring with syscallbuf seems pointless since we get the batching effect as-is. If necessary we could make the real buffers bigger than the fake buffers.) So performance might be close to as good as one could expect.
This assumes application threads don't race with the kernel's writes to user-space I/O buffers. If we don't want to assume that, we can extend this to allocate additional scratch buffers, rewrite submission-queue entries to point to those buffers, and copy the contents of those buffers to the right place when we see new completion queue entries.
We are starting to run into binaries we want to record that use io_uring. Are there any (active) plans to add support for io_uring in rr?
Are you running into binaries that would work fine if rr returned ENOSYS for the io_uring syscalls and they fell back to whatever they used before, or are you running into binaries that need io_uring to be supported in rr?
I had a quick look. I think most binaries would work if rr returned ENOSYS. Recording is currently blocked because an internal assert in rr fires.
That said, we'd also be interested in io_uring actually being supported. Most systems our (test) binaries run on support io_uring, so where probed I'm guessing it's the most commonly used backend at this point.
Can you tell us more about what you work on?
Sorry for the delay in reply. I was afk for a bit.
I work in the continuous integration space at Facebook. We are trialing rr in our developer and CI environments. We have binaries and tests using io_uring. The assert firing is a blocker for further rollout.
Returning ENOSYS would unblock things for now. It would mean rr recorded executions use a different I/O backend which wouldn't be ideal long term.
Alright, 7854be5362baadc0143b956279e96f3c4f511dfa makes io_uring return ENOSYS for now.
Grr.. I'm working with binaries that don't have any fallback and only have io_uring as a backend...
If it's important to you, you could try contracting @khuey to do it.