activemq-artemis icon indicating copy to clipboard operation
activemq-artemis copied to clipboard

ARTEMIS-3163 Experimental support for Netty IO_URING incubator

Open franz1981 opened this issue 3 years ago • 12 comments

https://issues.apache.org/jira/browse/ARTEMIS-3163

franz1981 avatar Mar 07 '21 14:03 franz1981

These are my results by using a single single-threaded acceptor for both clients and replication (on the live broker) to fairly compare epoll vs io_uring under load. The test is similar to the one on https://issues.apache.org/jira/browse/ARTEMIS-2852 with 32 JMS core clients, 100 persistent bytes messages and IO_URING transport has been used only on the live server, leaving the rest as it is ie backup + clients

NOTE: These are just preliminary results, so I won't share HW configuration or anything to make this reproducible, but it should give the magnitude of improvement offered by io_uring.

master:

**************
EndToEnd Throughput: 22582 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean               1410.83
min                 333.82
50.00%             1368.06
90.00%             1679.36
99.00%             2293.76
99.90%             3489.79
99.99%            13107.20
max               16187.39
count               320000

this pr:

**************
EndToEnd Throughput: 30540 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean               1052.52
min                 329.73
50.00%             1007.62
90.00%             1286.14
99.00%             1736.70
99.90%             4653.06
99.99%            13893.63
max               16711.68
count               320000

The profile data collected with https://github.com/jvm-profiling-tools/async-profiler/ are attached on https://issues.apache.org/jira/browse/ARTEMIS-3163

But the important bits are:

  • Replication event loop thread: 935 (epoll) vs 775 (io_uring) samples -> ~94% cpu usage vs 78% cpu usage
  • SYSCALLs samples: epoll: ~61% samples image io_uring: ~31% samples image

The io_uring version is far more efficient while using resources then epoll despite our replication process already try to batch writes as much as possible to amortize syscall cost: would be interesting to compare io_uring with some Open-OnLoad kernel bypass driver using epoll :P

IMPORTANT: Why I've chosen to use a single thread for everything? Don't be tempted to use the default configuration, because it uses 3 * available cores for the replication/client acceptors: the io_uring version is that much efficient then epoll then the Netty event loops tends to go idle most of the time and need to be awaken, causing application threads to always pay the cost to wakeup event loop threads...this can make the io_uring version to look worse then epoll, while is right the opposite(!!)

franz1981 avatar Mar 07 '21 17:03 franz1981

Docs need updating.

michaelandrepearce avatar Mar 07 '21 19:03 michaelandrepearce

@franz1981 whats status on this one? Im keen to merge it, happy to help contribute any last bits like slight code re-org on if statements, and docs that are needed myself, if needed next week.

michaelandrepearce avatar Sep 06 '21 10:09 michaelandrepearce

whats status on this one? Im keen to merge it, happy to help contribute any last bits like slight code re-org on if statements, and docs that are needed myself, if needed next week.

That seems a nice: i just would like to perform some better testing to help users to know which kernel version (and incubator version) to use. Let it parks here for a week and then we can move on and maybe decide with a public vote if people are interested and would like to know what it is/its purpose. We can have a call too with the community to explain it :)

franz1981 avatar Sep 07 '21 07:09 franz1981

@franz1981, just a friendly reminder that it's been a week. This looks like a nice feature so it would be nice to get it out to the community if it's ready.

jbertram avatar Sep 17 '21 00:09 jbertram

@jbertram

just a friendly reminder that it's been a week. This looks like a nice feature so it would be nice to get it out to the community if it's ready.

I don't know yet if we should bring this in without any reflection usage (as @michaelandrepearce suggested) and we need to add a doc paragraph that state we have no idea (yet) if SSL or other features works as expected ie EXPERIMENTAL tag on everything. If any of the community people would like to raise the appropriate issues after this will be merged I can happily work on Netty side too, to fix things. Let me know if it makes sense to bring this on the community list.

franz1981 avatar Sep 17 '21 06:09 franz1981

I think we should run the extended suite using io_uring... if it's pass.. we should merge it upstream.

clebertsuconic avatar Sep 21 '21 14:09 clebertsuconic

@clebertsuconic i agree we should merge upstream asap, but we should be using the maven dependency and some docs are def needed before merge, @franz1981 my intent was to send you PR with changes and docs, I had a disaster work week, im hoping to get to it next week for you, actually just booked some hours out the calendar to try make sure i get the time.

@franz1981 - Just a note on machines with Solarflare cards with kernal by pass with ONLOAD, perf improvement not really noticeable, but i borrowed a machine without special network kit, and there was a quite noticeable latency improvement and drop in resource usage for the same loads, essentially matching what your own tests had already shown, but it does mean its reproducable outside your lab the improvement

michaelandrepearce avatar Sep 24 '21 22:09 michaelandrepearce

@franz1981 i sent to your branch which this is PR from, a PR with my proposed changes to address the comments i had raised in this PR, see: https://github.com/franz1981/activemq-artemis/pull/15 , if you agree with those then feel free to merge to your branch, and i have no further comments.

Basically 3 key bits i addressed.

  1. Logging order so entries are in asc id order so its easy for next development to know the next id to use
  2. Use netty binaries via maven and remove reflection
  3. Documentation - with clear note about incubator status

michaelandrepearce avatar Sep 28 '21 11:09 michaelandrepearce

@franz1981, @clebertsuconic, @michaelandrepearce, where are we on this? Is this something we still want to do? Netty's io_uring lib is still in their incubator, but it's up to 0.16 now.

jbertram avatar Dec 16 '22 04:12 jbertram

@jbertram i would like to still see this in, as long as we mark it experimental or in incubation in docs, i personally support this. @franz1981, its on your branch atm this PR and i sent you those changes still pending, if you don't have time anymore, i can send in PR from my branch (closing this one), and merge/rebase on current master.

michaelandrepearce avatar Jan 12 '23 17:01 michaelandrepearce

@michaelandrepearce, given the silence from @franz1981 I'd say you should go ahead and send a PR of your own with the changes you recommended (plus resolving the existing conflicts). Then we can move forward. Thanks!

jbertram avatar Jan 23 '23 02:01 jbertram