Linux's io_uring IO interface (2x performance vs libevent)
This is a WIP implementation of the io_uring interface into Crystal's scheduler. I'm opening this PR for early review.
This was only tested with Linux 5.12 on an x86_64 machine but should work on Linux 5.4+.
Here is a benchmark to demonstrate the current performance gains:
require "http/server"
{% if flag?(:preview_iouring) %}
require "./stdlib_patch"
{% end %}
server = HTTP::Server.new do |context|
context.response.content_type = "text/plain"
context.response.print "Hello world!"
end
address = server.bind_tcp 8080
puts "Listening on http://#{address}"
server.listen
Before:
$ wrk -t12 -c100 -d60s http://127.0.0.1:8080
Running 1m test @ http://127.0.0.1:8080
12 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.85ms 268.64us 11.66ms 88.43%
Req/Sec 9.51k 344.83 16.35k 74.86%
6814608 requests in 1.00m, 656.39MB read
Requests/sec: 113544.61
Transfer/sec: 10.94MB
After (with -Dpreview_iouring):
$ wrk -t12 -c100 -d60s http://127.0.0.1:8080
Running 1m test @ http://127.0.0.1:8080
12 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 422.39us 196.79us 14.97ms 95.13%
Req/Sec 18.96k 1.96k 156.16k 92.56%
13583654 requests in 1.00m, 1.28GB read
Requests/sec: 226021.87
Transfer/sec: 21.77MB
(Linux 5.12, Intel Xeon E-2174G, inside a Docker container)
Hopefully, this can come in time for Crystal 1.1.0 :D
Fixes #10740. Informs #10766.
Wow, this looks great 👍
It's a lot of stuff 😄 We should try to extract some parts into preliminary PRs.
The syscall feature should definitely be a separate item.
Updates:
This currently exposes the same primitives as LibEvent (resume, timeout, wait_readable, and wait_writeable) through almost the same interface as the original Crystal::EventLoop. There are no changes specifics to any particular IO. Reads, writes, accepts, connects, etc work as they have always used to: They call into LibC, if it blocks they use the event loop to wait for readiness.
This isn't the most performant way to use io_uring (every IO is still making a system call), but at least it is fully working and is already 30% faster than libevent2 on my synthetic benchmark. Doing IO directly with io_uring can be added later for extra performance.
It adds two compile-time flags and one runtime env that are only meaningful on Linux:
-
-Dpreview_iouringenables runtime detection of a supported Kernel. It will use io_uring is available and fallback to libevent otherwise. -
-Dforce_iouringwill always use io_uring unconditionally. On systems that don't support it, the program will fail to start. -
CRYSTAL_DISABLE_IO_URING=1is only meaningful when using-Dpreview_iouring. It disables io_uring at runtime.
For this to work Crystal::Event was modified to be an abstract struct and this touches the Windows port (@straight-shoota).
I couldn't enable specs because GitHub Actions runs with Kernel 5.4, which doesn't support everything we use (particularly timeout cancelation). I can probably get it working down to this version, I'll try. Specs are passing on my machine.
Note that the diff includes everything from #10777.
Got it working fine on Linux 5.4 and CI is happy :D
I'm now marking it as ready to review.
A few notes for reviewers:
- It only works in Linux 5.4 and 5.5 with
-Dforce_iouring. Using just-Dpreview_iouringwon't enable it because the code used for support detection only works on 5.6+. I don't think this is really an issue. - I didn't test it much together with
-Dpreview_mt. As both technologies are "preview" I don't think it would be wise to use them together for now. Either way, the handling is simple: one ring per thread. - Executables build with
-Dforce_iouringwill fail to start on Linux earlier than 5.1 and will work on Linux 5.2 and 5.3, but with all timeouts resolving instantaneously. This meanssleepwill always behave asFiber.yieldand any IO with a timeout will fail if it needs to be async. This flag really shouldn't be used unless the person is sure it will run on Linux 5.4+. - I'm not really sure about the behavior after a
fork. Manpages are vague but it seems to indicate that ongoing IO operations are canceled on the child. The current implementation recreates the ring on the child to ensure it won't interfere with the parent. It at least works for a "fork-exec" and I didn't test much besides that. - It currently exposes the same interface as libevent and doesn't take advantage of the ability to do direct IO without system calls. It's already 30% faster than libevent now. Reaching the full potential with direct IOs will be part of future PRs.
- Due to a change on
Crystal::Eventit might cause small conflicts with the Windows effort. - The IoUring class is quite complicated and low-level, but I tried my best to explain the logic with comments. I hope it helps.
Regarding the non-detection on kernel < 5.6, we could change the runtime configuration to allow forcing either alternative:
-
CRYSTAL_EVENTLOOP=io_uringforces io_uring -
CRYSTAL_EVENTLOOP=libeventforces libevent -
CRYSTAL_EVENTLOOP=auto(or undetined) uses detection
I have synced it with master and fixed a few remaining issues. I would like to ask for another round of reviews.
I've just re-discovered this great PR. Makes me sad it's still sitting here for almost a year.
@straight-shoota Is there something that's stopping this PR from being merged from core team prospective?
As a reminder, there is no public API changes and everything is behind a "preview" compile flag, so there is very little risk of breaking something in production.
Either way I believe the primary concern is about manutenability in the long run, as this is a fairly complex code to adopt. It is understandable for a change like this to be not so fast to accept.
If there is anything I can do, I'm available.
In /usr/share/crystal/src/crystal/system/unix/event_libevent.cr:19:11
19 | def add(timeout : Time::Span?) : Nil
^------
Warning: positional parameter 'timeout' corresponds to parameter 'time_span' of the overridden method Crystal::Event#add(time_span : Time::Span | ::Nil), which has a different name and may affect named argument passing
The latest merges from master I did introduced some bug or flaky behavior on the CI tests and I'll have to investigate what's going on to fix that.
Before I put any more effort into this work I want to ask the core maintainer: Is this PR something that would eventually be accepted into Crystal or the added complexity will do more harm than good and this shouldn't be merged anytime soon? If it's the later, let's close this.
I would very much like to merge this. More in https://github.com/crystal-lang/crystal/issues/10740#issuecomment-1287079088
Thanks for your great work pushing it forward! I think with the event loop for windows now in place we should be able to continue with the refactoring necessary for io_uring integration.
Hey there @lbguilherme @straight-shoota - highly looking forward to having this for Crystal! :heart:
What are the next steps here? Could we update the tracking issue https://github.com/crystal-lang/crystal/issues/10740 with a roadmap, if necessary?
Hi @z64! I'm having very little time to work on this lately, so things are going slowly. The current status is that the code done and has been refactored in-line with current master. It should be working, but there is a bug causing the CI to seg fault and couldn't find the cause yet. Help is always appreciated, ofc.
Apparently there are also "I/O rings" on Windows since 21H2: https://learn.microsoft.com/en-us/windows/win32/api/ioringapi/
They are apparently very similar to io_uring: https://windows-internals.com/ioring-vs-io_uring-a-comparison-of-windows-and-linux-implementations/
There is, but they are different enough that it would still take a lot of work to support windows. And I don't know how much is supported, I only that original article about it and that was not really sufficient when it comes to what operations are there
It should be working, but there is a bug causing the CI to seg fault and couldn't find the cause yet.
Could any Crystal maintainer take a look? This improvement is of great importance.
This pull request has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/curious-about-the-eventloop-updates/6825/1