truffleruby
truffleruby copied to clipboard
Sidekiq is slow on TruffleRuby
From @nateberkopec https://twitter.com/nateberkopec/status/1096880039491133441 Instructions to reproduce at https://gist.github.com/nateberkopec/a612498e9719af6b8775ecdcb8afaa18 (time to process 20 000 empty jobs with Sidekiq).
We should investigate why it's slow.
hiredis
does not work yet (#1546).
@nateberkopec Does it matter whether the benchmark is run with the Rails app, or just running Sidekiq after the script created the jobs is enough? That would simplify the setup.
I don't think it matters whether or not this is run in a Rails app.
EDIT: I don't have the numbers ready to copypaste, but for me, Truffle native completed the benchmark in about a minute, and MRI was about 40 seconds.
I was running this today and here's what I found:
I also saw pauses between batches of jobs, although short ones, for native but not jvm. Using the --vm.XX:+PrintGC
I could confirm these were pauses for GC.
I did run with RAILS_ENV=production
as run in development will do many checks for file updates.
I saw these values: MRI 2.6.5 ~ 10 seconds
TruffleRuby native
~ 65 seconds
and then with some configuration:
TRUFFLERUBYOPT='--vm.Xmx16G --vm.Xmn4G'
~ 34 seconds
TRUFFLERUBYOPT='--vm.Xmx16G --vm.Xmn4G --experimental-options --engine.Mode=latency'
~ 26 seconds
With TruffleRuby I saw some delay in startup ( ~10-15 seconds of requiring/loading) which is an area I have been working on improving recently.
latest results are Truffleruby head (23 sec) vs MRI Ruby 2.7 (23 sec)
On par is progress (thanks for rerunning!), but I think TruffleRuby should be faster here, especially if it's allowed to process jobs in parallel. Seems worth investigating once the new CPUSampler flamegraph support is in.
New numbers from @mperham: https://twitter.com/eregontp/status/1479534410546483201 We should investigate what's slower. Those results seem to be with toxiproxy.
Thanks. I appreciate you investigating this.
I tried running the benchmark but quickly run into non-fatal errors which happen regularly. It also happens on CRuby although less often as CRuby runs the benchmark for less long. https://github.com/mperham/sidekiq/issues/5137
There is also another type of error when running on TruffleRuby:
2022-01-24T12:33:58.904Z pid=138840 tid=fzk ERROR: Error fetching job: Interrupted system call
2022-01-24T12:33:58.905Z pid=138840 tid=fts WARN: Errno::EINTR: Interrupted system call
2022-01-24T12:33:58.905Z pid=138840 tid=ftc WARN: exception.c:61:in `rb_syserr_fail'
exception.c:61:in `rb_syserr_fail'
exception.c:72:in `rb_sys_fail'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/hiredis-0.6.3/ext/hiredis_ext/connection.c:423:in `__get_reply'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/hiredis-0.6.3/ext/hiredis_ext/connection.c:458:in `connection_read'
/home/eregon/.rubies/truffleruby-dev/lib/truffle/truffle/cext_ruby.rb:41:in `read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/hiredis-0.6.3/lib/hiredis/ext/connection.rb:19:in `read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/connection/hiredis.rb:55:in `read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:280:in `block in read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:268:in `io'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:279:in `read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:131:in `block in call'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:248:in `block (2 levels) in process'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:389:in `ensure_connected'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:238:in `block in process'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:325:in `logging'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:237:in `process'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:131:in `call'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:226:in `block in call_with_timeout'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:300:in `with_socket_timeout'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:225:in `call_with_timeout'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:1224:in `block in _bpop'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:69:in `block in synchronize'
/home/eregon/.rubies/truffleruby-dev/lib/mri/monitor.rb:218:in `mon_synchronize'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:69:in `synchronize'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:1221:in `_bpop'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:1266:in `brpop'
/home/eregon/code/sidekiq/lib/sidekiq/fetch.rb:47:in `block in retrieve_work'
/home/eregon/code/sidekiq/lib/sidekiq.rb:100:in `block in redis'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/connection_pool-2.2.5/lib/connection_pool.rb:63:in `block (2 levels) in with'
<internal:core> core/thread.rb:94:in `handle_interrupt'
<internal:core> core/thread.rb:94:in `handle_interrupt'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/connection_pool-2.2.5/lib/connection_pool.rb:62:in `block in with'
<internal:core> core/thread.rb:94:in `handle_interrupt'
<internal:core> core/thread.rb:94:in `handle_interrupt'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/connection_pool-2.2.5/lib/connection_pool.rb:59:in `with'
/home/eregon/code/sidekiq/lib/sidekiq.rb:97:in `redis'
/home/eregon/code/sidekiq/lib/sidekiq/fetch.rb:47:in `retrieve_work'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:83:in `get_one'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:95:in `fetch'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:77:in `process_one'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:68:in `run'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:67:in `run'
/home/eregon/code/sidekiq/lib/sidekiq/util.rb:56:in `watchdog'
/home/eregon/code/sidekiq/lib/sidekiq/util.rb:65:in `block in safe_thread'
This might be a bug in hiredis to not handle EINTR correctly (EINTR should always be retried).
UPDATE: actually it's an issue of rb_thread_fd_select() in TruffleRuby => GR-36556.
That might not happen on CRuby due to Thread.handle_interrupt(Exception => :never)
, not sure.
And this not only creates errors but also prevents profiling, because profiling needs to know the thread backtraces (like Thread#backtrace
) and that needs to interrupt system calls with SIGVTALRM, very similar to what CRuby does for e.g. Ctrl+C.
FYI, in my recent tests not using hiredis improves a bit overall performance. Not sure if using it brings any benefits to Redis on Truffleruby
I see different numbers now: 45s vs 21s . Still much slower but difference is not x4 anymore, which is good UPDATE: I rerun it without hiredis and saw similar results 45s vs 14s, so in case of TruffleRuby it doesn't make much difference
Right, we should look at this again. I was focused on Ruby 3 keyword arguments and other things but that's done now, and the fix has been merged in sidekiq (https://github.com/mperham/sidekiq/commit/1efbaddcefb75a958872f73b5614ea91ddf35be6).
Actually on master sidekiqload now doesn't use hiredis, so I think we should start with that, hiredis likely only adds complexity. For instance there is an issue with interruption of rb_thread_fd_select() in TruffleRuby, only happens with hiredis (GR-36556), and that prevents profiling (obviously we should fix that, but good if it doesn't block this, and it doesn't seem so easy).
I've been taking a look at this./ TruffleRUby is faster at the JSON generation required for creating jobs,. but it's slower at IO for a variety of reasons. The JSON generation could likely be made faster by optimising Float#to_s
which is most of the time spent on the JSON generation, and I'm looking at using poll
instead of select
for IO waiting. to reduce the overheads in that part of the code.
We also appear to be much slower at parsing the job json, so validating the job by dumping it to json and parsing it back is not working so well on TruffleRuby.