truffleruby
truffleruby copied to clipboard
roda/puma: big differences in performance between TruffleRuby 2.1.0-dev and MRI 2.7
Used https://github.com/jodosha/ruby-web-bench. TruffleRuby is almost 50x slower than MRI 2.7 ππΏπΎ
git clone https://github.com/jodosha/ruby-web-bench.git
cd ruby-web-bench
gem install roda puma
rackup apps/roda-10000.ru
wrk -t 2 http://localhost:9292/j/j/j/j
TruffleRuby 2.1.0-dev
Test 1:
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 441.70ms 267.22ms 1.50s 84.94%
Req/Sec 15.35 9.53 40.00 68.91%
237 requests in 10.03s, 19.50KB read
Requests/sec: 23.64
Transfer/sec: 1.95KB
Test 2:
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 359.08ms 190.81ms 1.28s 83.17%
Req/Sec 17.33 9.90 40.00 67.88%
288 requests in 10.08s, 23.62KB read
Requests/sec: 28.56
Transfer/sec: 2.34KB
MRI 2.7
Test 1:
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.58ms 5.23ms 55.07ms 76.65%
Req/Sec 540.71 70.98 650.00 75.50%
10782 requests in 10.02s, 0.86MB read
Requests/sec: 1075.74
Transfer/sec: 88.26KB
Test 2:
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 10.14ms 5.75ms 67.73ms 78.11%
Req/Sec 511.93 67.57 650.00 66.50%
10210 requests in 10.02s, 837.73KB read
Requests/sec: 1018.99
Transfer/sec: 83.61KB
10 seconds is likely not enough for TruffleRuby to warmup and important methods to be JIT-compiled.
@eregon I tried build with https://github.com/oracle/truffleruby/commit/d1230ebe1257d85dd1e7de7fd12ab5a482230ebb The numbers are a bit better for TruffleRuby, but not much. What I can see the problem with high latency (always over 1 seconds in any my tests). When I tried one Rails application, there were many timeouts while requesting.
@deepj Could you also try with 1 route on roda on puma? https://github.com/jodosha/ruby-web-bench/blob/3babcb8a86f47673504632e19d1c4369ea684d0b/apps/roda-10000.ru is a bit weird honestly, 32000 lines, 10000+ blocks (so it's going to be very slow to JIT all those), probably not representative of anything realistic.
@eregon Honestly, there is no much differences except the latency.
app/roda-1.ru
Test 1 (warn)
wrk -t 2 http://localhost:9292/j/j/j/j
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 218.72ms 107.00ms 705.87ms 84.60%
Req/Sec 26.44 12.81 60.00 59.88%
472 requests in 10.07s, 29.96KB read
Requests/sec: 46.88
Transfer/sec: 2.98KB
Test 2 (hot)
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 369.36ms 240.25ms 1.14s 78.30%
Req/Sec 18.16 11.02 50.00 67.20%
287 requests in 10.09s, 18.22KB read
Requests/sec: 28.45
Transfer/sec: 1.81KB
Test 3 (hot)
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 110.25ms 100.44ms 791.70ms 90.21%
Req/Sec 55.01 23.13 130.00 62.23%
1051 requests in 10.02s, 66.71KB read
Requests/sec: 104.86
Transfer/sec: 6.66KB
app/roda-10000.ru
Test 1 (warm)
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 255.94ms 133.55ms 951.48ms 82.86%
Req/Sec 21.43 10.31 50.00 58.62%
404 requests in 10.08s, 33.14KB read
Requests/sec: 40.07
Transfer/sec: 3.29KB
Test 2 (hot)
Running 10s test @ http://localhost:9292/j/j/j/j
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 169.10ms 239.53ms 1.38s 91.42%
Req/Sec 53.91 26.85 150.00 75.31%
925 requests in 10.03s, 75.88KB read
Requests/sec: 92.19
Transfer/sec: 7.56KB
BTW:
32000 lines, 10000+ blocks (so it's going to be very slow to JIT all those), probably not representative of anything realistic.
Anything impossible in corporate world. I can very easily image this as a highly potential scenario there πΉ
Not strictly related, but don't want to open another issue. Testing on a routing benchmark, I found the following results:
Click to see performance and memory usage graphs
Tried using different engines such as disabling inlining and splitting, but the difference is still noticeable.
@deepj Please remember that when testing against apps/*-1.ru
you should do GET /
, instead of GET /j/j/j/j
. π
Thanks for sharing those results, they indeed look unexpectedly slow. We'll look into it.
As it was mentioned before, 10 sec probably is not enough for Truffleruby so I extended it to 30 sec and got following results: Truffleruby head
novoi@L35975MEU-2 ruby-web-bench % wrk -t 2 -d 30 http://localhost:9292/
Running 30s test @ http://localhost:9292/
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 84.36ms 209.80ms 1.03s 88.49%
Req/Sec 2.33k 809.06 3.14k 81.90%
110224 requests in 30.01s, 6.83MB read
Requests/sec: 3672.86
Transfer/sec: 233.14KB
MRI Ruby 2.7
novoi@L35975MEU-2 ruby-web-bench % ruby -v
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]
novoi@L35975MEU-2 ruby-web-bench % wrk -t 2 -d 30 http://localhost:9292/
Running 30s test @ http://localhost:9292/
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 7.60ms 8.85ms 111.87ms 81.49%
Req/Sec 1.09k 93.02 1.29k 81.00%
64890 requests in 30.02s, 4.14MB read
Requests/sec: 2161.44
Transfer/sec: 141.18KB
@deepj