liburing icon indicating copy to clipboard operation
liburing copied to clipboard

[Question] IOPOLL + SQPOLL + ATTACH_WQ

Open akseg73 opened this issue 2 years ago • 2 comments

In order to reduce latency for some of our usage, we are thinking of utilizing IOPOLL + SQPOLL. Can you please advise us on the following?

  1. Is IORING_SETUP_SQPOLL + IORING_SETUP_IOPOLL + IORING_SETUP_ATTACH_WQ supported? Is there a unit test that demonstrates the usage?

  2. It looks like IORING_SETUP_IOPOLL is supported only when O_DIRECT is also specified. When O_DIRECT is specified is the OS buffer pool completely ignored? Is there any chance that IOPOLL can be supported without O_DIRECT? Due to buffered writes we are able to absorb multiple writes to a single page into a single disk write. That could be an advantage for improving disk life as well as improving throughput. If a lot of pages see multiple writes. If not what choice do we have?

  3. There is a paper by Samsung that compares Spdk with io_uring and they suggest not to utilize IOPOLL together with SQPOLL. Can you please advise if there is a reason not to utilize the two together.

https://www.usenix.org/sites/default/files/conference/protected-files/vault20_slides_lund.pdf

  1. I hope sq_thread_idle is really in miliseconds, we do not want this thread to execute unless our usage really demands it. I am asking because our usage seems to indicate that even in absence of i/o we see severe degradation in performance which might suggest that this thread is executing past the idle timeout.

  2. I will separately provide some performance measurements. Basically for bulk reads we see very high throughput, but where we have to do synchronous i/o and latency is an issue we still have to tune it to possibly utilize IOPOLL as well. Unless you can advise any better.

  3. What can we do to reduce read latency if we are unable to utilize O_DIRECT and some of our reads are effectively synchronous.

  4. Simply enabling IORING_SETUP_SQPOLL starts to interfere with the performance of our product. If we disable it then performance is as expected. This is without even performing any i/o. We simply enable IORING_SETUP_SQPOLL and the product starts to see all kinds of performance issues. I would have imagined that sq_thread_idle is supposed to take care of this issue.

  5. If we utilize IOPOLL, then is the polling accomplished automatically with io_uring_submit_and_wait() ? That would be a great interface. Or alternatively if one were to utilize io_uring_submit(), would the polling be done by io_uring_wait_cqe().

I am very grateful for the help you have provided so far, without it we would not be able to get this working at all. I hope not to take much more of your time.

akseg73 avatar Aug 09 '21 02:08 akseg73

You might be interested in looking at #385 regarding your IOPOLL questions. But in general, you can either use io_uring_submit_and_wait() or a combination of io_uring_submit() and io_uring_wait_cqe(). In both cases io_uring would poll the devices instead of waiting for the device to interrupt.

int io_uring_submit_and_wait(struct io_uring *ring, unsigned wait_nr) Same as io_uring_submit(), but takes an additional parameter wait_nr that lets you specify how many completions to wait for. This call will block until wait_nr submission requests are processed by the kernel and their details placed in the completion queue.

aosterthun avatar Aug 11 '21 08:08 aosterthun

Hi akseg73, I'm not an expert of io_uring, but I guess Jens and Pavel are busy now, so allow me to answer part of your questions as possible as I can.

In order to reduce latency for some of our usage, we are thinking of utilizing IOPOLL + SQPOLL. Can you please advise us on the following?

  1. Is IORING_SETUP_SQPOLL + IORING_SETUP_IOPOLL + IORING_SETUP_ATTACH_WQ supported? Is there a unit test that demonstrates the usage?

I'm not sure if there are this kind of test somewhere, but AFAIK, liburing/test/sq-poll-share.c shows how to use SQPOLL+ATTACH_WQ, there is separate IOPOLL test as well, hope that can help.

  1. It looks like IORING_SETUP_IOPOLL is supported only when O_DIRECT is also specified. When

Yes.

O_DIRECT is specified is the OS buffer pool completely ignored?

If 'the OS buffer poll' means page cache, then YES.

Is there any chance that IOPOLL can be supported without O_DIRECT? Due to buffered writes

AFAIK, iopoll is to poll on the device, buffered IO is most likely about to visit memory, doesn't make sense to do iopoll in that case.

we are able to absorb multiple writes to a single page into a single disk write. That could be an advantage for improving disk life as well as improving throughput. If a lot of pages see multiple writes. If not what choice do we have? 3. There is a paper by Samsung that compares Spdk with io_uring and they suggest not to utilize IOPOLL together with SQPOLL. Can you please advise if there is a reason not to utilize the two together.

no clue about that since I don't know the background info of that paper.

https://www.usenix.org/sites/default/files/conference/protected-files/vault20_slides_lund.pdf

  1. I hope sq_thread_idle is really in miliseconds, we do not want this thread to execute unless our usage really demands it. I am asking because our usage seems to indicate that even in absence of i/o we see severe degradation in performance which might suggest that this thread is executing past the idle timeout.

Sorry I'm not following, do you mean sqthread costs some cpu resource even there are few IOs, but in that case, performance(latency, iops) should be fine since sqthread is running.

In absense of IO requests, the performance may probably degrade because sqthread is most like sleeping and it has to be wakeup if you set idle to a small value. the wakeup and schedule cost some time.

  1. I will separately provide some performance measurements. Basically for bulk reads we see very high throughput, but where we have to do synchronous i/o and latency is an issue we still have to tune it to possibly utilize IOPOLL as well. Unless you can advise any better.
  2. What can we do to reduce read latency if we are unable to utilize O_DIRECT and some of our reads are effectively synchronous.
  3. Simply enabling IORING_SETUP_SQPOLL starts to interfere with the performance of our product. If we disable it then performance is as expected. This is without even performing any i/o. We simply enable IORING_SETUP_SQPOLL and the product starts to see all kinds of performance issues. I would have imagined that sq_thread_idle is supposed to take care of this issue.

Even no IO requests submitted? That is interesting, could you post a perf result of the sqthread? Btw, do you bind sqthread to a specific cpu, is it running on the same cpu with your application.

  1. If we utilize IOPOLL, then is the polling accomplished automatically with io_uring_submit_and_wait() ? That would be a great interface. Or alternatively if one were to utilize io_uring_submit(), would the polling be done by io_uring_wait_cqe().

I am very grateful for the help you have provided so far, without it we would not be able to get this working at all. I hope not to take much more of your time.

HowHsu avatar Sep 26 '21 13:09 HowHsu