rr icon indicating copy to clipboard operation
rr copied to clipboard

Support `io_uring`

Open rocallahan opened this issue 4 years ago • 8 comments

Here's a possible approach.

  • On io_uring_setup, create a file monitor identifying the fd as an io_uring fd.
  • When that fd is mapped, remove any MAP_FIXED flag and set the prot flags to read/write and let the syscall proceed. This returns the address of the real uring buffer. Map a same-sized area of memory for the application's use (reapplying MAP_FIXED and with the right prot flags, if necessary) and return that address to the application. rr remembers the connection between the two buffers; when the fake uring buffer is unmapped, we also have to unmap the real buffer.
  • Before and after io_uring_enter, and, possibly at other times when we trap to rr, if there are submission queue entries in a fake buffer that haven't been copied to the real buffer, copy them, update the fake buffer head pointer, and record that change. Also remember any user-space memory ranges that the kernel may write to, associated with their queue entry.
  • Before and after io_uring_enter, and, possibly at other times when we trap to rr, If there are completion queue entries in the real buffer that haven't been copied to the fake buffer, copy and record them, and also record any associated user-space buffers.

This won't be very fast, since in many cases it will mean more io_ring_enter syscalls than without rr, and all io_uring_enter syscalls will require trapping to rr (i.e. 4 context switches), but if the submission queue is large then we will batch a lot of I/O operations per trap --- a bit like syscallbuf. (Trying to integrate io_uring with syscallbuf seems pointless since we get the batching effect as-is. If necessary we could make the real buffers bigger than the fake buffers.) So performance might be close to as good as one could expect.

This assumes application threads don't race with the kernel's writes to user-space I/O buffers. If we don't want to assume that, we can extend this to allocate additional scratch buffers, rewrite submission-queue entries to point to those buffers, and copy the contents of those buffers to the right place when we see new completion queue entries.

rocallahan avatar Jun 29 '20 02:06 rocallahan

We are starting to run into binaries we want to record that use io_uring. Are there any (active) plans to add support for io_uring in rr?

asm89 avatar May 25 '21 20:05 asm89

Are you running into binaries that would work fine if rr returned ENOSYS for the io_uring syscalls and they fell back to whatever they used before, or are you running into binaries that need io_uring to be supported in rr?

khuey avatar May 25 '21 20:05 khuey

I had a quick look. I think most binaries would work if rr returned ENOSYS. Recording is currently blocked because an internal assert in rr fires.

That said, we'd also be interested in io_uring actually being supported. Most systems our (test) binaries run on support io_uring, so where probed I'm guessing it's the most commonly used backend at this point.

asm89 avatar May 26 '21 14:05 asm89

Can you tell us more about what you work on?

rocallahan avatar May 26 '21 14:05 rocallahan

Sorry for the delay in reply. I was afk for a bit.

I work in the continuous integration space at Facebook. We are trialing rr in our developer and CI environments. We have binaries and tests using io_uring. The assert firing is a blocker for further rollout.

Returning ENOSYS would unblock things for now. It would mean rr recorded executions use a different I/O backend which wouldn't be ideal long term.

asm89 avatar Jun 14 '21 12:06 asm89

Alright, 7854be5362baadc0143b956279e96f3c4f511dfa makes io_uring return ENOSYS for now.

rocallahan avatar Jun 14 '21 22:06 rocallahan

Grr.. I'm working with binaries that don't have any fallback and only have io_uring as a backend...

vlovich avatar Oct 06 '23 22:10 vlovich

If it's important to you, you could try contracting @khuey to do it.

rocallahan avatar Oct 07 '23 01:10 rocallahan