liburing icon indicating copy to clipboard operation
liburing copied to clipboard

How to wait for a timeout efficiently.

Open ioquatix opened this issue 4 years ago • 10 comments

It's not clear to me how we should use io_uring_wait_cqes efficiently.

https://github.com/axboe/liburing/blob/a209abe65befe603311b6fcd861e14d28e3e2f2d/src/queue.c#L281

Essentially, I imagine the following logic:

  • Event loop
    • Peek for events - if some events completed, process them, otherwise:
      • io_uring_wait_cqes until next timer event should fire.

It seems like we should avoid calling io_uring_submit after registering some sqe (e.g. poll readable/writable) and rather do that if peeking for events returns 0 events.

In the documentation, it seems implied that calling io_uring_wait_cqes will call io_uring_submit but on closer inspection it seems it depends on kernel version and several other factors.

Therefore, the only way we could make this work was like this:

static
void * select_internal(void *_arguments) {
	struct select_arguments * arguments = (struct select_arguments *)_arguments;
	
	// **** Ensure that all pending events are submitted ****:
	io_uring_submit(&arguments->data->ring);
	
	int result = io_uring_wait_cqes(&arguments->data->ring, arguments->cqes, 1, arguments->timeout, NULL);
	
	if (result == -ETIME) {
		// If waiting resulted in a timeout, there are 0 events.
		arguments->count = 0;
	} else if (result == 0) {
		// Otherwise, there was no error, at least 1 event was reported. So we ask for them all.
		arguments->count = io_uring_peek_batch_cqe(&arguments->data->ring, arguments->cqes, URING_MAX_EVENTS);
	} else {
		arguments->count = result;
	}
	
	return NULL;
}

However, in some cases, it seems like this is 2 system calls when 1 would be sufficient.

When using strace to observe the above we see:

poll_add(0x7fc62c17c000, 6, 25, 0x7fc6284255e0)
io_uring_enter(5, 1, 0, 0, NULL, 8)     = 1
io_uring_enter(5, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8

If we remove the io_uring_submit line, we see this:

poll_add(0x7f5b31560000, 6, 25, 0x7f5b2d80ed70)
io_uring_enter(5, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8

The 2nd argument is 0 (to_submit) and thus this seems like the sqe is never submitted correctly.

In this case, the event loop will never create a cqe for the poll_add sqe as it never seems to be submitted.

  1. How should we know to call io_uring_submit or not?
  2. Should we need to call it ever? Why can't the io_uring_wait_cqes invoke it predictably?

It seems inefficient to do two system calls when one could be sufficient.

ioquatix avatar May 11 '21 07:05 ioquatix

It's not clear to me how we should use io_uring_wait_cqes efficiently.

https://github.com/axboe/liburing/blob/a209abe65befe603311b6fcd861e14d28e3e2f2d/src/queue.c#L281

Essentially, I imagine the following logic:

* Event loop
  
  * Peek for events - if some events completed, process them, otherwise:
    
    * io_uring_wait_cqes until next timer event should fire.

It seems like we should avoid calling io_uring_submit after registering some sqe (e.g. poll readable/writable) and rather do that if peeking for events returns 0 events.

Why? You fill SQEs -- you submit them, maybe in a single syscall together with CQ waiting and/or delayed to collect enough SQEs for efficiency, but still as simple as this.

In the documentation, it seems implied that calling io_uring_wait_cqes will call io_uring_submit but on closer inspection it seems it depends on kernel version and several other factors.

It does not now and for good. Docs should be fixed, what file:line was it?

However, in some cases, it seems like this is 2 system calls when 1 would be sufficient.

So, use the right function instead of io_uring_wait_cqes(). See __io_uring_get_cqe().

Note: either of them return only 1 CQE, not an array of them. Then you can peek them (e.g. io_uring_peek_cqe()) or use io_uring_for_each_cqe. Only function returning an array is io_uring_peek_batch_cqe(), but take a look at the sources, it just does post-processing looping similarly as io_uring_for_each_cqe

The 2nd argument is 0 (to_submit) and thus this seems like the sqe is never submitted correctly.

In this case, the event loop will never create a cqe for the poll_add sqe as it never seems to be submitted.

1. How should we know to call `io_uring_submit` or not?

Userspace fills SQEs, so it should know when it needs to submit anything or not.

2. Should we need to call it ever? Why can't the `io_uring_wait_cqes` invoke it predictably?

As mentioned, because this function doesn't do that. See more generic functions.

isilence avatar May 11 '21 09:05 isilence

I guess __io_uring_get_cqe is missing timeout, we just need to export underlying _io_uring_get_cqe

isilence avatar May 11 '21 09:05 isilence

It does not now and for good. Docs should be fixed, what file:line was it?

https://unixism.net/loti/ref-liburing/completion.html

"If ts is specified, the application need not call io_uring_submit() before calling this function, as it will be done internally."

Why? You fill SQEs -- you submit them, maybe in a single syscall together with CQ waiting and/or delayed to collect enough SQEs for efficiency, but still as simple as this.

Yes, that's what I want.

So, use the right function instead of io_uring_wait_cqes(). See __io_uring_get_cqe().

Thanks, I was not aware it was okay to use this internal function.

Note: either of them return only 1 CQE, not an array of them. Then you can peek them (e.g. io_uring_peek_cqe()) or use io_uring_for_each_cqe. Only function returning an array is io_uring_peek_batch_cqe(), but take a look at the sources, it just does post-processing looping similarly as io_uring_for_each_cqe

Thanks I will prefer to process them using a loop as you suggest.

I guess __io_uring_get_cqe is missing timeout, we just need to export underlying _io_uring_get_cqe

Yes, I think there is some missing piece here, and that might be it.

ioquatix avatar May 11 '21 11:05 ioquatix

Userspace fills SQEs, so it should know when it needs to submit anything or not.

On this point, we can certainly keep track of the number of SQEs filled out, and call io_uring_submit if needed before peeking/waiting with timeout. It just seemed like internally io_uring_submit already knows this.

ioquatix avatar May 11 '21 11:05 ioquatix

The API is a bit messy because a lot was added since the beginning, e.g. there is io_uring_submit_and_wait() but it lacks and so on. At some point we'll need to get a separate more convenient set of helpers for submit/wait.

isilence avatar May 11 '21 12:05 isilence

check for io_uring_wait_cqe_timeout(). I suspect this is what you need.

Concerning your questioning about whether or not the wait functions will also submit the sqes. I did review the code and all is good. submit will be performed only if needed. That is if:

  • There is something to submit or
  • SQPOLL is not present or needs to be wakened

io_uring_submit_and_wait() is nice too. Note that this second function doesn't have any cqe output pointer param. imho this is a small inconsistency of the API compared to all the other wait_cqe functions but it is not a very big deal because once you return from that function, you can use the macro: io_uring_for_each_cqe()

which is essentially free of any function calls so it is very lightweight.

Here is an example of how to use the macro:

  unsigned completed = 0;
  unsigned head;
  struct io_uring_cqe *cqe;

  io_uring_for_each_cqe(&ring, head, cqe)  {
      ++completed;
      iouring_process_cqe(cqe);
  }
  if (completed) {
      io_uring_cq_advance(&ring, completed);
  }

the key point to remember is that 1 system call is used to submit and wait so when using the API, make sure that you don't end up making 2 system calls for doing the 2 operations.

lano1106 avatar May 12 '21 19:05 lano1106

Userspace fills SQEs, so it should know when it needs to submit anything or not.

On this point, we can certainly keep track of the number of SQEs filled out, and call io_uring_submit if needed before peeking/waiting with timeout. It just seemed like internally io_uring_submit already knows this.

Check this thread where I have an issue about how to know if there is sqes to submit or not: https://github.com/axboe/liburing/issues/342

The short answer is doing this: unsigned to_submit = iouring_ring.sq.sqe_tail - iouring_ring.sq.sqe_head;

just like it is done in function __io_uring_flush_sq()

lano1106 avatar May 12 '21 19:05 lano1106

check for io_uring_wait_cqe_timeout(). I suspect this is what you need.

I tried this but it failed to work correctly because it does not always call submit apparently.

	io_uring_submit(&arguments->data->ring);
	
	int result = io_uring_wait_cqe_timeout(&arguments->data->ring, arguments->cqes, arguments->timeout);

Commenting out io_uring_submit fails some of our tests.

ioquatix avatar May 13 '21 03:05 ioquatix

check for io_uring_wait_cqe_timeout(). I suspect this is what you need.

I tried this but it failed to work correctly because it does not always call submit apparently.

	io_uring_submit(&arguments->data->ring);
	
	int result = io_uring_wait_cqe_timeout(&arguments->data->ring, arguments->cqes, arguments->timeout);

Commenting out io_uring_submit fails some of our tests.

Weird. Looked up the code, if you have the new enough libruing and you always pass non-NULL timeout it should submit. It's not guaranteed to not change, though.

But agree that we should refine API and add more straightforward variants of the functions

isilence avatar May 16 '21 22:05 isilence

We need a function which:

  • If there are completion events, return immediately with the count, otherwise
  • Submits if required, then
  • Waits for at least one event, and
  • Returns the total number of events available.

I don't know if we can depend on the current interface, so we need to either:

  • Use syscalls directly (would like to avoid).
  • Always call io_uring_submit first (two system calls).

I'm sure my use case is fairly typical.

ioquatix avatar May 17 '21 00:05 ioquatix