crystal icon indicating copy to clipboard operation
crystal copied to clipboard

Linux's io_uring IO interface (2x performance vs libevent)

Open lbguilherme opened this issue 4 years ago • 16 comments

This is a WIP implementation of the io_uring interface into Crystal's scheduler. I'm opening this PR for early review.

This was only tested with Linux 5.12 on an x86_64 machine but should work on Linux 5.4+.

Here is a benchmark to demonstrate the current performance gains:

require "http/server"

{% if flag?(:preview_iouring) %}
  require "./stdlib_patch"
{% end %}

server = HTTP::Server.new do |context|
  context.response.content_type = "text/plain"
  context.response.print "Hello world!"
end

address = server.bind_tcp 8080
puts "Listening on http://#{address}"
server.listen

Before:

$ wrk -t12 -c100 -d60s http://127.0.0.1:8080
Running 1m test @ http://127.0.0.1:8080
  12 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.85ms  268.64us  11.66ms   88.43%
    Req/Sec     9.51k   344.83    16.35k    74.86%
  6814608 requests in 1.00m, 656.39MB read
Requests/sec: 113544.61
Transfer/sec:     10.94MB

After (with -Dpreview_iouring):

$ wrk -t12 -c100 -d60s http://127.0.0.1:8080
Running 1m test @ http://127.0.0.1:8080
  12 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   422.39us  196.79us  14.97ms   95.13%
    Req/Sec    18.96k     1.96k  156.16k    92.56%
  13583654 requests in 1.00m, 1.28GB read
Requests/sec: 226021.87
Transfer/sec:     21.77MB

(Linux 5.12, Intel Xeon E-2174G, inside a Docker container)

Hopefully, this can come in time for Crystal 1.1.0 :D

Fixes #10740. Informs #10766.

lbguilherme avatar May 30 '21 20:05 lbguilherme

Wow, this looks great 👍

It's a lot of stuff 😄 We should try to extract some parts into preliminary PRs.

The syscall feature should definitely be a separate item.

straight-shoota avatar May 31 '21 11:05 straight-shoota

Updates:

This currently exposes the same primitives as LibEvent (resume, timeout, wait_readable, and wait_writeable) through almost the same interface as the original Crystal::EventLoop. There are no changes specifics to any particular IO. Reads, writes, accepts, connects, etc work as they have always used to: They call into LibC, if it blocks they use the event loop to wait for readiness.

This isn't the most performant way to use io_uring (every IO is still making a system call), but at least it is fully working and is already 30% faster than libevent2 on my synthetic benchmark. Doing IO directly with io_uring can be added later for extra performance.

It adds two compile-time flags and one runtime env that are only meaningful on Linux:

  • -Dpreview_iouring enables runtime detection of a supported Kernel. It will use io_uring is available and fallback to libevent otherwise.
  • -Dforce_iouring will always use io_uring unconditionally. On systems that don't support it, the program will fail to start.
  • CRYSTAL_DISABLE_IO_URING=1 is only meaningful when using -Dpreview_iouring. It disables io_uring at runtime.

For this to work Crystal::Event was modified to be an abstract struct and this touches the Windows port (@straight-shoota).

I couldn't enable specs because GitHub Actions runs with Kernel 5.4, which doesn't support everything we use (particularly timeout cancelation). I can probably get it working down to this version, I'll try. Specs are passing on my machine.

Note that the diff includes everything from #10777.

lbguilherme avatar Jun 05 '21 13:06 lbguilherme

Got it working fine on Linux 5.4 and CI is happy :D

I'm now marking it as ready to review.

A few notes for reviewers:

  • It only works in Linux 5.4 and 5.5 with -Dforce_iouring. Using just -Dpreview_iouring won't enable it because the code used for support detection only works on 5.6+. I don't think this is really an issue.
  • I didn't test it much together with -Dpreview_mt. As both technologies are "preview" I don't think it would be wise to use them together for now. Either way, the handling is simple: one ring per thread.
  • Executables build with -Dforce_iouring will fail to start on Linux earlier than 5.1 and will work on Linux 5.2 and 5.3, but with all timeouts resolving instantaneously. This means sleep will always behave as Fiber.yield and any IO with a timeout will fail if it needs to be async. This flag really shouldn't be used unless the person is sure it will run on Linux 5.4+.
  • I'm not really sure about the behavior after a fork. Manpages are vague but it seems to indicate that ongoing IO operations are canceled on the child. The current implementation recreates the ring on the child to ensure it won't interfere with the parent. It at least works for a "fork-exec" and I didn't test much besides that.
  • It currently exposes the same interface as libevent and doesn't take advantage of the ability to do direct IO without system calls. It's already 30% faster than libevent now. Reaching the full potential with direct IOs will be part of future PRs.
  • Due to a change on Crystal::Event it might cause small conflicts with the Windows effort.
  • The IoUring class is quite complicated and low-level, but I tried my best to explain the logic with comments. I hope it helps.

lbguilherme avatar Jun 06 '21 12:06 lbguilherme

Regarding the non-detection on kernel < 5.6, we could change the runtime configuration to allow forcing either alternative:

  • CRYSTAL_EVENTLOOP=io_uring forces io_uring
  • CRYSTAL_EVENTLOOP=libevent forces libevent
  • CRYSTAL_EVENTLOOP=auto (or undetined) uses detection

straight-shoota avatar Jun 06 '21 15:06 straight-shoota

I have synced it with master and fixed a few remaining issues. I would like to ask for another round of reviews.

lbguilherme avatar Jan 12 '22 21:01 lbguilherme

I've just re-discovered this great PR. Makes me sad it's still sitting here for almost a year.

@straight-shoota Is there something that's stopping this PR from being merged from core team prospective?

vlazar avatar May 22 '22 11:05 vlazar

As a reminder, there is no public API changes and everything is behind a "preview" compile flag, so there is very little risk of breaking something in production.

Either way I believe the primary concern is about manutenability in the long run, as this is a fairly complex code to adopt. It is understandable for a change like this to be not so fast to accept.

If there is anything I can do, I'm available.

lbguilherme avatar May 24 '22 02:05 lbguilherme

In /usr/share/crystal/src/crystal/system/unix/event_libevent.cr:19:11

 19 | def add(timeout : Time::Span?) : Nil
              ^------
Warning: positional parameter 'timeout' corresponds to parameter 'time_span' of the overridden method Crystal::Event#add(time_span : Time::Span | ::Nil), which has a different name and may affect named argument passing

carlhoerberg avatar Oct 20 '22 11:10 carlhoerberg

The latest merges from master I did introduced some bug or flaky behavior on the CI tests and I'll have to investigate what's going on to fix that.

Before I put any more effort into this work I want to ask the core maintainer: Is this PR something that would eventually be accepted into Crystal or the added complexity will do more harm than good and this shouldn't be merged anytime soon? If it's the later, let's close this.

lbguilherme avatar Oct 21 '22 13:10 lbguilherme

I would very much like to merge this. More in https://github.com/crystal-lang/crystal/issues/10740#issuecomment-1287079088

Thanks for your great work pushing it forward! I think with the event loop for windows now in place we should be able to continue with the refactoring necessary for io_uring integration.

straight-shoota avatar Oct 21 '22 14:10 straight-shoota

Hey there @lbguilherme @straight-shoota - highly looking forward to having this for Crystal! :heart:

What are the next steps here? Could we update the tracking issue https://github.com/crystal-lang/crystal/issues/10740 with a roadmap, if necessary?

z64 avatar Dec 27 '22 23:12 z64

Hi @z64! I'm having very little time to work on this lately, so things are going slowly. The current status is that the code done and has been refactored in-line with current master. It should be working, but there is a bug causing the CI to seg fault and couldn't find the cause yet. Help is always appreciated, ofc.

lbguilherme avatar Dec 28 '22 01:12 lbguilherme

Apparently there are also "I/O rings" on Windows since 21H2: https://learn.microsoft.com/en-us/windows/win32/api/ioringapi/

They are apparently very similar to io_uring: https://windows-internals.com/ioring-vs-io_uring-a-comparison-of-windows-and-linux-implementations/

HertzDevil avatar May 26 '23 17:05 HertzDevil

There is, but they are different enough that it would still take a lot of work to support windows. And I don't know how much is supported, I only that original article about it and that was not really sufficient when it comes to what operations are there

yxhuvud avatar May 26 '23 18:05 yxhuvud

It should be working, but there is a bug causing the CI to seg fault and couldn't find the cause yet.

Could any Crystal maintainer take a look? This improvement is of great importance.

paulocoghi avatar Oct 10 '23 07:10 paulocoghi

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/curious-about-the-eventloop-updates/6825/1

crysbot avatar May 07 '24 16:05 crysbot