truffleruby roda/puma: big differences in performance between TruffleRuby 2.1.0-dev and MRI 2.7

Used https://github.com/jodosha/ruby-web-bench. TruffleRuby is almost 50x slower than MRI 2.7 🙀😿😾

git clone https://github.com/jodosha/ruby-web-bench.git
cd ruby-web-bench
gem install roda puma
rackup apps/roda-10000.ru
wrk -t 2 http://localhost:9292/j/j/j/j

TruffleRuby 2.1.0-dev

Test 1:

Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   441.70ms  267.22ms   1.50s    84.94%
    Req/Sec    15.35      9.53    40.00     68.91%
  237 requests in 10.03s, 19.50KB read
Requests/sec:     23.64
Transfer/sec:      1.95KB

Test 2:

Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   359.08ms  190.81ms   1.28s    83.17%
    Req/Sec    17.33      9.90    40.00     67.88%
  288 requests in 10.08s, 23.62KB read
Requests/sec:     28.56
Transfer/sec:      2.34KB

MRI 2.7

Test 1:

Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.58ms    5.23ms  55.07ms   76.65%
    Req/Sec   540.71     70.98   650.00     75.50%
  10782 requests in 10.02s, 0.86MB read
Requests/sec:   1075.74
Transfer/sec:     88.26KB

Test 2:

Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.14ms    5.75ms  67.73ms   78.11%
    Req/Sec   511.93     67.57   650.00     66.50%
  10210 requests in 10.02s, 837.73KB read
Requests/sec:   1018.99
Transfer/sec:     83.61KB

Feb 28 '20 01:02 deepj

10 seconds is likely not enough for TruffleRuby to warmup and important methods to be JIT-compiled.

Feb 28 '20 11:02 eregon

@eregon I tried build with https://github.com/oracle/truffleruby/commit/d1230ebe1257d85dd1e7de7fd12ab5a482230ebb The numbers are a bit better for TruffleRuby, but not much. What I can see the problem with high latency (always over 1 seconds in any my tests). When I tried one Rails application, there were many timeouts while requesting.

Apr 27 '20 01:04 deepj

@deepj Could you also try with 1 route on roda on puma? https://github.com/jodosha/ruby-web-bench/blob/3babcb8a86f47673504632e19d1c4369ea684d0b/apps/roda-10000.ru is a bit weird honestly, 32000 lines, 10000+ blocks (so it's going to be very slow to JIT all those), probably not representative of anything realistic.

Apr 27 '20 09:04 eregon

@eregon Honestly, there is no much differences except the latency.

app/roda-1.ru

Test 1 (warn)

wrk -t 2 http://localhost:9292/j/j/j/j
Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   218.72ms  107.00ms 705.87ms   84.60%
    Req/Sec    26.44     12.81    60.00     59.88%
  472 requests in 10.07s, 29.96KB read
Requests/sec:     46.88
Transfer/sec:      2.98KB

Test 2 (hot)

Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   369.36ms  240.25ms   1.14s    78.30%
    Req/Sec    18.16     11.02    50.00     67.20%
  287 requests in 10.09s, 18.22KB read
Requests/sec:     28.45
Transfer/sec:      1.81KB

Test 3 (hot)

Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   110.25ms  100.44ms 791.70ms   90.21%
    Req/Sec    55.01     23.13   130.00     62.23%
  1051 requests in 10.02s, 66.71KB read
Requests/sec:    104.86
Transfer/sec:      6.66KB

app/roda-10000.ru

Test 1 (warm)

Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   255.94ms  133.55ms 951.48ms   82.86%
    Req/Sec    21.43     10.31    50.00     58.62%
  404 requests in 10.08s, 33.14KB read
Requests/sec:     40.07
Transfer/sec:      3.29KB

Test 2 (hot)

Running 10s test @ http://localhost:9292/j/j/j/j
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   169.10ms  239.53ms   1.38s    91.42%
    Req/Sec    53.91     26.85   150.00     75.31%
  925 requests in 10.03s, 75.88KB read
Requests/sec:     92.19
Transfer/sec:      7.56KB

BTW:

32000 lines, 10000+ blocks (so it's going to be very slow to JIT all those), probably not representative of anything realistic.

Anything impossible in corporate world. I can very easily image this as a highly potential scenario there 😹

Apr 27 '20 10:04 deepj

Not strictly related, but don't want to open another issue. Testing on a routing benchmark, I found the following results:

Click to see performance and memory usage graphs

runtime_with_startup runtime memory

Tried using different engines such as disabling inlining and splitting, but the difference is still noticeable.

May 27 '20 00:05 ElMassimo

@deepj Please remember that when testing against apps/*-1.ru you should do GET /, instead of GET /j/j/j/j. 🙂

May 27 '20 16:05 jodosha

Thanks for sharing those results, they indeed look unexpectedly slow. We'll look into it.

May 27 '20 17:05 eregon

As it was mentioned before, 10 sec probably is not enough for Truffleruby so I extended it to 30 sec and got following results: Truffleruby head

novoi@L35975MEU-2 ruby-web-bench % wrk -t 2 -d 30 http://localhost:9292/
Running 30s test @ http://localhost:9292/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    84.36ms  209.80ms   1.03s    88.49%
    Req/Sec     2.33k   809.06     3.14k    81.90%
  110224 requests in 30.01s, 6.83MB read
Requests/sec:   3672.86
Transfer/sec:    233.14KB

MRI Ruby 2.7

novoi@L35975MEU-2 ruby-web-bench % ruby -v
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]
novoi@L35975MEU-2 ruby-web-bench % wrk -t 2 -d 30 http://localhost:9292/
Running 30s test @ http://localhost:9292/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.60ms    8.85ms 111.87ms   81.49%
    Req/Sec     1.09k    93.02     1.29k    81.00%
  64890 requests in 30.02s, 4.14MB read
Requests/sec:   2161.44
Transfer/sec:    141.18KB

@deepj

Aug 15 '21 11:08 gogainda

truffleruby truffleruby copied to clipboard

roda/puma: big differences in performance between TruffleRuby 2.1.0-dev and MRI 2.7

app/roda-1.ru

app/roda-10000.ru

truffleruby
truffleruby copied to clipboard