liburing Feature Request: API for Partial Wait (with Optional Timeout) for IO Completions

io_uring_wait_cqes currently waits for all wait_nr IO completions. It would be nice to have a slightly different API that doesn't wait for all wait_nr but returns when any are available, and it returns all IO completions are currently available. I have tried to produce a similar behavior by using io_uring_peek_batch_cqe (see below), but in the "wait" case, it only ever returns one, instead of all available after wait.

uint32_t eventq_dequeue(struct io_uring* queue, struct io_uring_cqe** events, uint32_t count, uint32_t wait_time) {
    int result = io_uring_peek_batch_cqe(queue, events, count);
    if (result > 0 || wait_time == 0) return result;
    if (wait_time != UINT32_MAX) {
        struct __kernel_timespec timeout;
        timeout.tv_sec = (wait_time / 1000);
        timeout.tv_nsec = ((wait_time % 1000) * 1000000);
        result = io_uring_wait_cqe_timeout(queue, events, &timeout);
    } else {
        result = io_uring_wait_cqe(queue, events);
    }
    return result == 0 ? 1 : 0;
}

I'd really love to have a single function that has (essentially) the same signature as eventq_dequeue above.

Jul 29 '22 14:07 nibanks

I think you're confusing two things. The wait api does not return events to you, those you can just find and reap from user space. Those are two different operations. Sounds like what you want is just using 1 for the wait count, and then you just iterate completions when that returns.

Jul 29 '22 15:07 axboe

I am coalescing the (possible) wait and the return of completions, but I'm not sure if it's very efficient to do the following with the existing APIs (I'd be happy if I'm wrong!):

Check if there are any completions.
If so, return them.
Else if wait == 0, return empty.
Wait the specified time (possibly infinity).
On wake, return all available completions.

Jul 29 '22 15:07 nibanks

The only expensive part in that list is the waiting on the events. Checking for events is just a memory read. So yes, that is the expected use case.

Jul 29 '22 15:07 axboe

So, should I update my function above to return io_uring_peek_batch_cqe(queue, events, count); instead of return result == 0 ? 1 : 0; Is that really optimal? Would/could it be more efficient to put all this into a single io_uring_* function?

Jul 29 '22 15:07 nibanks

I'm OOO today so only on the phone, hence haven't looked at your code at all. I'll check later.

Jul 29 '22 16:07 axboe

I suggest

io_uring_submit_and_wait(&ring, 1);
io_uring_cqe *cqe;
unsigned head;
int cqe_count = 0;
io_uring_for_each_cqe(&ring, head, cqe) {
    ++cqe_count;
    /* use cqe here */
}
io_uring_cq_advance(&ring, cqe_count);

Jul 29 '22 17:07 CarterLi

Thanks for the suggestion @CarterLi but I am trying to implement an abstraction layer that works with multiple different IO models, on different platforms. That's what the eventq_dequeue function above is for.

Jul 29 '22 17:07 nibanks

Then just have the caller iterate and do the advance of the cq ring. Either that, or you'd need to copy the event which isn't ideal.

Jul 29 '22 17:07 axboe

What's the difference between io_uring_wait_cqe and io_uring_submit_and_wait? io_uring_wait_cqe also returns you the IO completion, while io_uring_wait_cqe just waits? What about the "submit" part? What exactly does that mean?

Jul 29 '22 17:07 nibanks

And what about a io_uring_wait_cqe_timeout equivalent? I found io_uring_submit_and_wait_timeout but it takes the cqe_ptr and a sigmask too, so I'm not sure if that's what I should use.

Jul 29 '22 18:07 nibanks

What's the difference between io_uring_wait_cqe and io_uring_submit_and_wait? io_uring_wait_cqe also returns you the IO completion, while io_uring_wait_cqe just waits? What about the "submit" part? What exactly does that mean?

io_uring_submit_and_wait = io_uring_submit + io_uring_wait ( without returning cqe ) in one syscall io_uring_wait_cqe = io_uring_wait + for_each_cqe(cqe) { return cqe }

submit and wait both requires syscalls, which is expensive, while returning cqe ( the IO completion ) is only cheap memory reads.

io_uring_peek_batch_cqe copies entries in CQ to another buffer, which, IMO, is unnecessary and useless. Just use for_each_cqe

Jul 29 '22 18:07 CarterLi

Ok, so I don't need the submit, because it's assumed that was already done, possibly on a different thread. So I'm back to a peek, wait (possibly with timeout), peek model. Though I didn't know about io_uring_cq_advance so that's better than returning 1 at a time.

uint32_t eventq_dequeue(eventq* queue, eventq_cqe* events, uint32_t count, uint32_t wait_time) {
    int result = io_uring_peek_batch_cqe(queue, events, count);
    if (result > 0 || wait_time == 0) return result;
    if (wait_time != UINT32_MAX) {
        struct __kernel_timespec timeout;
        timeout.tv_sec = (wait_time / 1000);
        timeout.tv_nsec = ((wait_time % 1000) * 1000000);
        (void)io_uring_wait_cqe_timeout(queue, events, &timeout);
    } else {
        (void)io_uring_wait_cqe(queue, events);
    }
    return io_uring_peek_batch_cqe(queue, events, count);
}
void eventq_return(eventq* queue, uint32_t count) {
    io_uring_cq_advance(queue, count);
}

My proposed changes: https://github.com/nibanks/eventq/pull/7

Jul 29 '22 18:07 nibanks

liburing liburing copied to clipboard

Feature Request: API for Partial Wait (with Optional Timeout) for IO Completions

liburing
liburing copied to clipboard