seastar
seastar copied to clipboard
reactor: io_uring: enable some optimization flags
Enable some optimization flags in an attempt to improve performance with io_uring:
IORING_SETUP_COOP_TASKRUN - prevents a completion from interrupting the reactor if it is running. Requires that the reactor issue an io_uring_enter system call in a timely fashion, but thanks to the task quota timer, we do.
IORING_SETUP_TASKRUN_FLAG - sets up a flag that notifies the reactor that the kernel has pending completions that it did not process. This allows the reactor to issue an io_uring_enter even if it has no pending submission queue entries or completion queue entries (e.g. it indicates a third queue, in the kernel, is not empty).
IORING_SETUP_SINGLE_ISSUER - elides some locking by guaranteeing that only a single thread plays with the ring; this happens to be true for us.
IORING_SETUP_DEFER_TASKRUN - batches up completion processing in an attempt to get some more performance.
This flags bump up the dependencies to Linux 6.1 and liburing 2.2. This seems worthwhile as right now io-uring lags behind linux-aio (which processes completions from interrupt context and therefore doesn't need all these optimizations).
After this exercise, io_uring is still slower than linux-aio.
This flags bump up the dependencies to Linux 6.1 and liburing 2.2. This seems worthwhile as right now io-uring lags behind linux-aio (which processes completions from interrupt context and therefore doesn't need all these optimizations). However, I don't know how to specify the liburing version requirement.
lemme create a change to bump the required version to v2.2
following patch would do the trick:
diff --git a/cmake/SeastarDependencies.cmake b/cmake/SeastarDependencies.cmake
index 6c80d0fa..9aff230b 100644
--- a/cmake/SeastarDependencies.cmake
+++ b/cmake/SeastarDependencies.cmake
@@ -133,7 +133,7 @@ macro (seastar_find_dependencies)
seastar_set_dep_args (Protobuf REQUIRED
VERSION 2.5.0)
seastar_set_dep_args (LibUring
- VERSION 2.0
+ VERSION 2.2
OPTION ${Seastar_IO_URING})
seastar_set_dep_args (StdAtomic REQUIRED)
seastar_set_dep_args (hwloc
Thanks, I'll update my patch. But unfortunately I still see a 10% slowdown, I don't think my changes did anything.
v2: applies patch from @tchaikov to bump the uring version dependency
Enable some optimization flags in an attempt to improve performance with io_uring: .. After this exercise, io_uring is still slower than linux-aio.
Can you share some example numbers on how much io_uring is still slower than linux-aio, and how much did your patch improve?
Client:
ab -k -c 100 -n 1000000 http://localhost:10000/
Server:
./build/release/apps/httpd/httpd --smp 1 --cpuset 0 -m 1G --reactor-backend linux-aio
Requests per second: 110300.38 [#/sec] (mean)
Server:
./build/release/apps/httpd/httpd --smp 1 --cpuset 0 -m 1G --reactor-backend io_uring
Requests per second: 96563.98 [#/sec] (mean)
Usually linux-aio/io_uring are within 10% of each other. I'm not sure if my patch improves anything, the results are noisy.
The most I was able to see is a large difference in IPC, but I have no idea how that happens.
I did find a difference:
[avi@avi seastar (uring-poll-first)]$ sudo perf stat -a -e irq_vectors:reschedule_entry ./build/release/apps/httpd/httpd --smp 1 --cpuset 0 -m 1G --reactor-backend io_uring
INFO 2024-02-11 18:18:18,608 seastar - Reactor backend: io_uring
starting prometheus API server
Seastar HTTP server listening on port 10000 ...
^CStoppping HTTP server
Stoppping Prometheus server
Performance counter stats for 'system wide':
980,159 irq_vectors:reschedule_entry
14.433292048 seconds time elapsed
[avi@avi seastar (uring-poll-first)]$ sudo perf stat -a -e irq_vectors:reschedule_entry ./build/release/apps/httpd/httpd --smp 1 --cpuset 0 -m 1G --reactor-backend linux-aio
INFO 2024-02-11 18:18:40,236 seastar - Reactor backend: linux-aio
starting prometheus API server
Seastar HTTP server listening on port 10000 ...
^CStoppping HTTP server
Stoppping Prometheus server
Performance counter stats for 'system wide':
496 irq_vectors:reschedule_entry
13.693591716 seconds time elapsed
I thought the patch addresses those self-interrupts, but maybe not)
Ah that run was without the patch.
Unfortunately the patch doesn't help sufficiently, it's still slower.
Follow-up: https://github.com/scylladb/seastar/pull/2092
On ARM, these two patches change io_uring from ~ -12% to ~ +2%. So maybe there's hope.