liburing
liburing copied to clipboard
liburing giving similar throughput as select/epoll
I am currently working on this branch adding io_uring as an option to receive data for a linux networking benchmark tool running n Ubuntu with Linux Kernel 5.13.
When I run the benchmark test using liburing or select, I get similar throughput levels and am unsure why that would be since liburing is asynchronous. Here is the file where I am using liburing to receive data. Not sure what I am missing to get the performance boost that io_uring offers.
I have tried using register buffers and register files, but they don't seem to help much with performance. As well as enabling the polling mode, but that only seems to increase CPU usage
I have tried using register buffers and register files, but they don't seem to help much with performance. As well as enabling the polling mode, but that only seems to increase CPU usage
In your code tcpstream.c, SQPOLL and IOPOLL features are not used, you can try them.
@mohsenomidi, I always had problems with that test. Apart from not testing anything useful, the numbers are least strange... Just run it on my laptop, 5.14. The number of iteration is increased x100 for pipes=100, otherwise under 1s.
# iter count = 1000000,
> taskset -c 3 ./io_uring 100
Pipes: 100
Time: 42.052680
> taskset -c 3 ./epoll 100
Pipes: 100
Time: 86.737568
# iter count = 10000,
> taskset -c 3 ./epoll 500
Pipes: 500
Time: 5.386944 # edited from 48.026645
> taskset -c 3 ./io_uring 500
Pipes: 500
Time: 2.106056
edit: for pipes=500, that's me screwing the test, so it's 5.3 vs 2.1
p.s. the second result (42 vs 2) looks weird, may be the test being buggy.
I have tried using register buffers and register files, but they don't seem to help much with performance. As well as enabling the polling mode, but that only seems to increase CPU usage
In your code tcpstream.c, SQPOLL and IOPOLL features are not used, you can try them.
Sockets don't support IOPOLL. Neither SQPOLL should be needed and may complicate the code.
I have tried using register buffers and register files, but they don't seem to help much with performance. As well as enabling the polling mode, but that only seems to increase CPU usage
@MarcoF1, good work.
You don't use registered files unless sqe->flags |= IOSQE_FIXED_FILE
is set.
@MarcoF1, it needs to be investigated, but my guess is that you don't do enough of batching. I.e. submitting several requests at once, and if I'm reading your code correctly it won't be of much difference. For instance, each io_uring_submit()
ends up doing a syscall per request.
What is the magnitude of the difference? Can you share some numbers? Also, I guess there are N parallel clients running and each using a io_uring instance. Right?