liburing icon indicating copy to clipboard operation
liburing copied to clipboard

Feature Request: API for Partial Wait (with Optional Timeout) for IO Completions

Open nibanks opened this issue 3 years ago • 12 comments

io_uring_wait_cqes currently waits for all wait_nr IO completions. It would be nice to have a slightly different API that doesn't wait for all wait_nr but returns when any are available, and it returns all IO completions are currently available. I have tried to produce a similar behavior by using io_uring_peek_batch_cqe (see below), but in the "wait" case, it only ever returns one, instead of all available after wait.

uint32_t eventq_dequeue(struct io_uring* queue, struct io_uring_cqe** events, uint32_t count, uint32_t wait_time) {
    int result = io_uring_peek_batch_cqe(queue, events, count);
    if (result > 0 || wait_time == 0) return result;
    if (wait_time != UINT32_MAX) {
        struct __kernel_timespec timeout;
        timeout.tv_sec = (wait_time / 1000);
        timeout.tv_nsec = ((wait_time % 1000) * 1000000);
        result = io_uring_wait_cqe_timeout(queue, events, &timeout);
    } else {
        result = io_uring_wait_cqe(queue, events);
    }
    return result == 0 ? 1 : 0;
}

I'd really love to have a single function that has (essentially) the same signature as eventq_dequeue above.

nibanks avatar Jul 29 '22 14:07 nibanks

I think you're confusing two things. The wait api does not return events to you, those you can just find and reap from user space. Those are two different operations. Sounds like what you want is just using 1 for the wait count, and then you just iterate completions when that returns.

axboe avatar Jul 29 '22 15:07 axboe

I am coalescing the (possible) wait and the return of completions, but I'm not sure if it's very efficient to do the following with the existing APIs (I'd be happy if I'm wrong!):

  1. Check if there are any completions.
  2. If so, return them.
  3. Else if wait == 0, return empty.
  4. Wait the specified time (possibly infinity).
  5. On wake, return all available completions.

nibanks avatar Jul 29 '22 15:07 nibanks

The only expensive part in that list is the waiting on the events. Checking for events is just a memory read. So yes, that is the expected use case.

axboe avatar Jul 29 '22 15:07 axboe

So, should I update my function above to return io_uring_peek_batch_cqe(queue, events, count); instead of return result == 0 ? 1 : 0; Is that really optimal? Would/could it be more efficient to put all this into a single io_uring_* function?

nibanks avatar Jul 29 '22 15:07 nibanks

I'm OOO today so only on the phone, hence haven't looked at your code at all. I'll check later.

axboe avatar Jul 29 '22 16:07 axboe

I suggest

io_uring_submit_and_wait(&ring, 1);
io_uring_cqe *cqe;
unsigned head;
int cqe_count = 0;
io_uring_for_each_cqe(&ring, head, cqe) {
    ++cqe_count;
    /* use cqe here */
}
io_uring_cq_advance(&ring, cqe_count);

CarterLi avatar Jul 29 '22 17:07 CarterLi

Thanks for the suggestion @CarterLi but I am trying to implement an abstraction layer that works with multiple different IO models, on different platforms. That's what the eventq_dequeue function above is for.

nibanks avatar Jul 29 '22 17:07 nibanks

Then just have the caller iterate and do the advance of the cq ring. Either that, or you'd need to copy the event which isn't ideal.

axboe avatar Jul 29 '22 17:07 axboe

What's the difference between io_uring_wait_cqe and io_uring_submit_and_wait? io_uring_wait_cqe also returns you the IO completion, while io_uring_wait_cqe just waits? What about the "submit" part? What exactly does that mean?

nibanks avatar Jul 29 '22 17:07 nibanks

And what about a io_uring_wait_cqe_timeout equivalent? I found io_uring_submit_and_wait_timeout but it takes the cqe_ptr and a sigmask too, so I'm not sure if that's what I should use.

nibanks avatar Jul 29 '22 18:07 nibanks

What's the difference between io_uring_wait_cqe and io_uring_submit_and_wait? io_uring_wait_cqe also returns you the IO completion, while io_uring_wait_cqe just waits? What about the "submit" part? What exactly does that mean?

io_uring_submit_and_wait = io_uring_submit + io_uring_wait ( without returning cqe ) in one syscall io_uring_wait_cqe = io_uring_wait + for_each_cqe(cqe) { return cqe }

submit and wait both requires syscalls, which is expensive, while returning cqe ( the IO completion ) is only cheap memory reads.

io_uring_peek_batch_cqe copies entries in CQ to another buffer, which, IMO, is unnecessary and useless. Just use for_each_cqe

CarterLi avatar Jul 29 '22 18:07 CarterLi

Ok, so I don't need the submit, because it's assumed that was already done, possibly on a different thread. So I'm back to a peek, wait (possibly with timeout), peek model. Though I didn't know about io_uring_cq_advance so that's better than returning 1 at a time.

uint32_t eventq_dequeue(eventq* queue, eventq_cqe* events, uint32_t count, uint32_t wait_time) {
    int result = io_uring_peek_batch_cqe(queue, events, count);
    if (result > 0 || wait_time == 0) return result;
    if (wait_time != UINT32_MAX) {
        struct __kernel_timespec timeout;
        timeout.tv_sec = (wait_time / 1000);
        timeout.tv_nsec = ((wait_time % 1000) * 1000000);
        (void)io_uring_wait_cqe_timeout(queue, events, &timeout);
    } else {
        (void)io_uring_wait_cqe(queue, events);
    }
    return io_uring_peek_batch_cqe(queue, events, count);
}
void eventq_return(eventq* queue, uint32_t count) {
    io_uring_cq_advance(queue, count);
}

My proposed changes: https://github.com/nibanks/eventq/pull/7

nibanks avatar Jul 29 '22 18:07 nibanks