truffleruby Slow specs

The test suite of https://github.com/rubocop/rubocop-ast runs in ~4 seconds in Ruby TruffleRuby takes over 2 minutes. TruffleRuby-head in 3 successive runs took 3m26s, failed (sorry no link) after 10 minutes and then took 1m37.

Is the ~30x slower run time to be expected / is there something we can do about it?
Is the uneven / unreliable runs in head something you are aware of?

Mar 09 '21 03:03 marcandre

No, it's not. all implementations with a JIT will likely be slower for tests/specs, but shouldn't be that much. In this case I guess it might be due to parser which has a really huge Ruby method, which is very bad for JITs (either too big to compile or compiled many times as there are so many different conditions in it). Probably we should find a way to generate smaller methods in the parser gem at least on TruffleRuby. We should investigate though, that's just a guess.
No, it would be helpful if you have link for a >10min run. BTW both links above are the same run.

Mar 09 '21 10:03 eregon

Oops, fixed the link to the 1m37 one. And no link for 10min run, as I did "restart all jobs" and that seem to delete the log. There was just a very generic error like "Job failed", no stack trace or nothing. I notice this morning that I set timeout-minutes to 5, so it's not clear how come it took 10 minutes to trigger the timeout. I'll update if I encounter another such run, but this gem is pretty stable now.

Mar 09 '21 16:03 marcandre

on my machine it's 43s vs 2s

Oct 12 '22 20:10 gogainda

The specs executed in 1min23s on both truffleruby and truffleruby-head in https://github.com/rubocop/rubocop-ast/actions/runs/3049983186/jobs/4916606764. While that's still slower than on CRuby it's pretty fast for CI in general, and around the same time as some other CI jobs: https://github.com/rubocop/rubocop-ast/actions/runs/3049983186/usage Locally I see:

Finished in 26.19 seconds (files took 4.77 seconds to load)
2387 examples, 0 failures
...
bundle exec rake spec  132.28s user 4.21s system 358% cpu 38.083 total

With TRUFFLERUBYOPT="--engine.Mode=latency" it goes down to:

Finished in 13.63 seconds (files took 3.85 seconds to load)

That seems good to add to rubocop-ast's CI to speed it up. (note, with TRUFFLERUBYOPT="--experimental-options --engine.Splitting=false" bundle exec rake spec it's also around 14s)

A quick profile using TRUFFLERUBYOPT=--cpusampler:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 3273 samples with period 10ms. Missed 7 samples.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread[main,5,main]
 Name                                                                                                                                                                                                      ||             Total Time    ||              Self Time    || Location             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Lexer#advance                                                                                                                                                                                             ||             4980ms  15.2% ||             2660ms   8.1% || ../../.rubies/truffleruby-dev/lib/gems/gems/parser-3.1.2.1/lib/parser/lexer.rb~11303-24828:470723-833042
 BasicObject#!=                                                                                                                                                                                            ||             1310ms   4.0% ||             1240ms   3.8% || (core)~1:0
 block (6 levels) in <top (required)>                                                                                                                                                                      ||             1150ms   3.5% ||             1150ms   3.5% || spec/rubocop/ast/traversal_spec.rb~78:1642-1692
 Hash#[]=                                                                                                                                                                                                  ||              960ms   2.9% ||              870ms   2.7% || (core)~1:0
 Kernel#gem_original_require                                                                                                                                                                               ||             2160ms   6.6% ||              720ms   2.2% || resource:/truffleruby/core/kernel.rb~226-238:7136-7516
 rb_ary_entry                                                                                                                                                                                              ||              980ms   3.0% ||              670ms   2.0% || array.c~81:0
 Hash#[]                                                                                                                                                                                                   ||             1190ms   3.6% ||              670ms   2.0% || (core)~1:0
 rb_array_len                                                                                                                                                                                              ||              640ms   2.0% ||              640ms   2.0% || array.c~15:0
 Truffle::RegexpOperations.match_in_region                                                                                                                                                                 ||              610ms   1.9% ||              610ms   1.9% || resource:/truffleruby/core/truffle/regexp_operations.rb~76-84:2354-2819
 Array#<<                                                                                                                                                                                                  ||              550ms   1.7% ||              550ms   1.7% || (core)~1:0
 Array#each                                                                                                                                                                                                ||            10670ms  32.6% ||              520ms   1.6% || (core)~1:0
 Kernel#freeze                                                                                                                                                                                             ||              470ms   1.4% ||              470ms   1.4% || (core)~1:0
 Array#[]                                                                                                                                                                                                  ||              360ms   1.1% ||              360ms   1.1% || (core)~1:0
 block in Struct._specialize                                                                                                                                                                               ||              370ms   1.1% ||              350ms   1.1% || resource:/truffleruby/core/struct.rb~469:12172-12196

15.2% in Lexer#advance, 3.5% in spec/rubocop/ast/traversal_spec.rb:78

Lexer#advance is huge, it's 13525 lines long (from the output: ../../.rubies/truffleruby-dev/lib/gems/gems/parser-3.1.2.1/lib/parser/lexer.rb:11303-24828) That likely means we can't compile it, it's simply too big to be JIT-compiled.

With TRUFFLERUBYOPT="--cpusampler --cpusampler.ShowTiers=true" bundle exec rake spec:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 3055 samples with period 10ms. Missed 9 samples.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
  T0: Percent of time spent in interpreter.
  T1: Percent of time spent in code compiled by tier 1 compiler.
  T2: Percent of time spent in code compiled by tier 2 compiler.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread[main,5,main]
 Name                                                                                                                                                           ||             Total Time    |   T0   |   T1   |   T2   ||              Self Time    |   T0   |   T1   |   T2   || Location             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Lexer#advance                                                                                                                                                  ||             4870ms  15.9% | 100.0% |   0.0% |   0.0% ||             2310ms   7.6% | 100.0% |   0.0% |   0.0% || ../../.rubies/truffleruby-dev/lib/gems/gems/parser-3.1.2.1/lib/parser/lexer.rb~11303-24828:470723-833042
 BasicObject#!=                                                                                                                                                 ||             1680ms   5.5% |   7.1% |  92.9% |   0.0% ||             1580ms   5.2% |   2.5% |  97.5% |   0.0% || (core)~1:0
 block (6 levels) in <top (required)>                                                                                                                           ||             1040ms   3.4% | 100.0% |   0.0% |   0.0% ||             1040ms   3.4% | 100.0% |   0.0% |   0.0% || spec/rubocop/ast/traversal_spec.rb~78:1642-1692
 rb_ary_entry                                                                                                                                                   ||             1210ms   4.0% |   8.3% |  32.2% |  59.5% ||              910ms   3.0% |   4.4% |  20.9% |  74.7% || array.c~81:0

It's clear indeed it's never compiled, because too big (with TRUFFLERUBYOPT="--engine.TraceCompilation"):

[engine] opt failed id=53879 Lexer#advance                                      |Tier 1|Time  890(-19232621+19233511)ms|Reason: org.graalvm.compiler.truffle.compiler.GraphTooBigBailoutException: Graph too big to safely compile. Node count: 115479. Graph Size: 150002. Limit: 150000.|Timestamp 19233510760952|Src lexer.rb:11303

Graph Size: 150002 is simply because it stopped partial evaluating when going over the limit, the full graph is way bigger most likely.

So we should look into what options ragel has to generate methods of a more reasonable size, and PR to the parser gem: https://github.com/whitequark/parser/issues/871

Previous references to this: https://github.com/oracle/truffleruby/issues/2081#issuecomment-682467901

Oct 13 '22 15:10 eregon

With TRUFFLERUBYOPT="--engine.Mode=latency" it goes down to:

what are the recommendations when to use such option?

Oct 13 '22 16:10 gogainda

what are the recommendations when to use such option?

In short, for test suites it's typically worth trying.

In general --engine.Mode=latency optimizes for latency at the expense of throughput. The default is the reverse. Concretely it compiles faster but less optimized. So peak performance is lower, but startup and warmup are faster. For specs and test suites, especially when running for a short time like here, most code doesn't get to peak performance and so it's mostly partially-warmed up code, a large part run in interpreter. The flag changes to only compiles in Tier 1, no Tier 2 compilation and so almost no inlining and it disables Splitting (copying methods per call site when detecting they would benefit from it for better compilation). For workloads like a short-running test suite where there is not enough time to compile in Tier 2 relevant methods (or it's not possible to compile them because many methods are executed just once or a few times and it's a lot of new code), it's better to only compile in Tier 1 as many relevant methods as possible, so we spend less time in interpreter.

Oct 13 '22 17:10 eregon

PR to use TRUFFLERUBYOPT=--engine.Mode=latency: https://github.com/rubocop/rubocop-ast/pull/241
Issue about Parser::Lexer#advance: https://github.com/whitequark/parser/issues/871

With that let's close this issue.

Oct 14 '22 12:10 eregon

truffleruby truffleruby copied to clipboard

Slow specs

truffleruby
truffleruby copied to clipboard