activemq-artemis
activemq-artemis copied to clipboard
ARTEMIS-3163 Experimental support for Netty IO_URING incubator
https://issues.apache.org/jira/browse/ARTEMIS-3163
These are my results by using a single single-threaded acceptor for both clients and replication (on the live broker) to fairly compare epoll vs io_uring under load. The test is similar to the one on https://issues.apache.org/jira/browse/ARTEMIS-2852 with 32 JMS core clients, 100 persistent bytes messages and IO_URING transport has been used only on the live server, leaving the rest as it is ie backup + clients
NOTE: These are just preliminary results, so I won't share HW configuration or anything to make this reproducible, but it should give the magnitude of improvement offered by io_uring.
master
:
**************
EndToEnd Throughput: 22582 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean 1410.83
min 333.82
50.00% 1368.06
90.00% 1679.36
99.00% 2293.76
99.90% 3489.79
99.99% 13107.20
max 16187.39
count 320000
this pr
:
**************
EndToEnd Throughput: 30540 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean 1052.52
min 329.73
50.00% 1007.62
90.00% 1286.14
99.00% 1736.70
99.90% 4653.06
99.99% 13893.63
max 16711.68
count 320000
The profile data collected with https://github.com/jvm-profiling-tools/async-profiler/ are attached on https://issues.apache.org/jira/browse/ARTEMIS-3163
But the important bits are:
- Replication event loop thread: 935 (epoll) vs 775 (io_uring) samples -> ~94% cpu usage vs 78% cpu usage
- SYSCALLs samples:
epoll
: ~61% samplesio_uring
: ~31% samples
The io_uring version is far more efficient while using resources then epoll despite our replication process already try to batch writes as much as possible to amortize syscall cost: would be interesting to compare io_uring with some Open-OnLoad kernel bypass driver using epoll :P
IMPORTANT: Why I've chosen to use a single thread for everything? Don't be tempted to use the default configuration, because it uses 3 * available cores for the replication/client acceptors: the io_uring version is that much efficient then epoll then the Netty event loops tends to go idle most of the time and need to be awaken, causing application threads to always pay the cost to wakeup event loop threads...this can make the io_uring version to look worse then epoll, while is right the opposite(!!)
Docs need updating.
@franz1981 whats status on this one? Im keen to merge it, happy to help contribute any last bits like slight code re-org on if statements, and docs that are needed myself, if needed next week.
whats status on this one? Im keen to merge it, happy to help contribute any last bits like slight code re-org on if statements, and docs that are needed myself, if needed next week.
That seems a nice: i just would like to perform some better testing to help users to know which kernel version (and incubator version) to use. Let it parks here for a week and then we can move on and maybe decide with a public vote if people are interested and would like to know what it is/its purpose. We can have a call too with the community to explain it :)
@franz1981, just a friendly reminder that it's been a week. This looks like a nice feature so it would be nice to get it out to the community if it's ready.
@jbertram
just a friendly reminder that it's been a week. This looks like a nice feature so it would be nice to get it out to the community if it's ready.
I don't know yet if we should bring this in without any reflection usage (as @michaelandrepearce suggested) and we need to add a doc paragraph that state we have no idea (yet) if SSL or other features works as expected ie EXPERIMENTAL tag on everything. If any of the community people would like to raise the appropriate issues after this will be merged I can happily work on Netty side too, to fix things. Let me know if it makes sense to bring this on the community list.
I think we should run the extended suite using io_uring... if it's pass.. we should merge it upstream.
@clebertsuconic i agree we should merge upstream asap, but we should be using the maven dependency and some docs are def needed before merge, @franz1981 my intent was to send you PR with changes and docs, I had a disaster work week, im hoping to get to it next week for you, actually just booked some hours out the calendar to try make sure i get the time.
@franz1981 - Just a note on machines with Solarflare cards with kernal by pass with ONLOAD, perf improvement not really noticeable, but i borrowed a machine without special network kit, and there was a quite noticeable latency improvement and drop in resource usage for the same loads, essentially matching what your own tests had already shown, but it does mean its reproducable outside your lab the improvement
@franz1981 i sent to your branch which this is PR from, a PR with my proposed changes to address the comments i had raised in this PR, see: https://github.com/franz1981/activemq-artemis/pull/15 , if you agree with those then feel free to merge to your branch, and i have no further comments.
Basically 3 key bits i addressed.
- Logging order so entries are in asc id order so its easy for next development to know the next id to use
- Use netty binaries via maven and remove reflection
- Documentation - with clear note about incubator status
@franz1981, @clebertsuconic, @michaelandrepearce, where are we on this? Is this something we still want to do? Netty's io_uring lib is still in their incubator, but it's up to 0.16 now.
@jbertram i would like to still see this in, as long as we mark it experimental or in incubation in docs, i personally support this. @franz1981, its on your branch atm this PR and i sent you those changes still pending, if you don't have time anymore, i can send in PR from my branch (closing this one), and merge/rebase on current master.
@michaelandrepearce, given the silence from @franz1981 I'd say you should go ahead and send a PR of your own with the changes you recommended (plus resolving the existing conflicts). Then we can move forward. Thanks!