truffleruby Jekyll Performance slower than MRI after warm up

Hi Using Jekyll, and keeping the process running so truffle has lots of warm-up time I found that MRI Ruby is faster. After 20x builds it is getting closer but MRI Ruby is at 0.5 seconds and Truffle --jvm is at ~0.7 and Truffle --native is ~0.9s. Perhaps that is within expectations, I'm not sure, but I put together a repo to demonstrate.

Here is my test repo

May 27 '21 18:05 adamgordonbell

Thank you for the report. I still need to read the related blog post (I just skimmed it so far).

Regarding profiling, there is some documentation https://github.com/oracle/truffleruby/blob/master/doc/user/tools.md (the easier is just adding --cpusampler). Those docs seem to miss to mention that's it's possible to profile with Chrome Inspector and get a flamechart there.

May 28 '21 10:05 eregon

2.7 | 33.7 s 3.0 | 44.1 s

If that's reproducible it seems worth reporting to CRuby too, that's ~31% slower.

May 28 '21 12:05 eregon

Truffle Ruby didn't work with Kramdown, so pandoc is used for markdown in this test.

Do you have backtrace or output? How can I reproduce that?

May 28 '21 12:05 eregon

What would be really useful here is to have a single Ruby process rendering multiple times, without waiting any time in between, otherwise it gets quite messy to profile, because most of the time is spent sleeping or waiting.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Thread: Thread[main,5,main]
 Name                                                                           |      Total Time     |  Opt % ||       Self Time     |  Opt % | Location             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Thread#join                                                                    |      93764ms  94.5% |   0.0% ||      93763ms  94.5% |   0.0% | (core)~1:0 
 FFI::Function#call                                                             |        571ms   0.6% |   0.0% ||        561ms   0.6% |   0.0% | ../../../.rubies/truffleruby-dev/lib/truffle/truffle/ffi_backend/function.rb~68-93:2961-3831 
 Kernel#gem_original_require                                                    |       1109ms   1.1% |   0.0% ||        255ms   0.3% |   0.0% | resource:/truffleruby/core/kernel.rb~226-238:7092-7472 
 yaml_parser_update_buffer                                                      |        186ms   0.2% |  21.5% ||        178ms   0.2% |  20.2% | reader.c~142:0 
 yaml_parser_scan_plain_scalar                                                  |        260ms   0.3% |   0.4% ||        143ms   0.1% |   0.7% | scanner.c~3389:0 
 yaml_parser_scan_flow_scalar                                                   |        164ms   0.2% |   1.8% ||        123ms   0.1% |   1.6% | scanner.c~3016:0 
 tokenizer_next                                                                 |        122ms   0.1% |   0.0% ||        122ms   0.1% |   0.0% | ../../../.rubies/truffleruby-dev/lib/gems/gems/liquid-c-4.0.0/ext/liquid_c/tokenizer.c~58:1435-1493 
 Kernel#require_relative                                                        |        808ms   0.8% |   0.0% ||         87ms   0.1% |   0.0% | resource:/truffleruby/core/kernel.rb~283-297:9145-9652 
 Marshal::State#construct                                                       |        242ms   0.2% |  43.0% ||         63ms   0.1% |   3.2% | resource:/truffleruby/core/marshal.rb~561-654:14617-17114 
 parse                                                                          |       1194ms   1.2% |   0.0% ||         62ms   0.1% |   0.0% | psych_parser.c~253:0

FWIW, I ran manually with:

bundle exec ruby --cpusampler -S jekyll serve --incremental -H 127.0.0.1 -P 4001

(from the test-two-seven directory to make sure I run the same code as CRuby 2.7) and then in another shell, multiple invocations of:

for i in {1..10}; do echo $i; echo "1\n" >> ./_posts/2009-05-15-edge-case-nested-and-mixed-lists.md; sleep 1; done

I notice a lot of threads created by Process.detach, so probably there are quite a lot of subprocesses. Process.detach also feels like an anti-pattern, I wonder what uses it.

May 28 '21 12:05 eregon

Thanks for looking into this!

What would be really useful here is to have a single Ruby process rendering multiple times, without waiting any time in between, otherwise it gets quite messy to profile, because most of the time is spent sleeping or waiting.

Yeah, my profiling method could be improved, but I'm not sure about the best way to tackle that. This was just a dirty way to be able to measure render time after warm up.

Do you have backtrace or output? How can I reproduce that?

This is the error I was getting:

[To redirect Truffle log output to a file use one of the following options:
* '--log.file=<path>' if the option is passed using a guest language launcher.
* '-Dpolyglot.log.file=<path>' if the option is passed using the host Java launcher.
* Configure logging using the polyglot embedding API.]
[ruby] WARNING: /Users/adam/.rbenv/versions/truffleruby-dev/lib/gems/gems/kramdown-2.3.1/lib/kramdown/parser/kramdown.rb:223: unstable interpolated regexps are not optimized

And then things were hanging. But I can't reproduce this anymore, now it seems to work fine but occasionally give that warning. So maybe that was a user error. This branch has it switched back to kramdown.

It's possible that kramdown is a better test case because pandoc is not a ruby program and kramdown is. Perhaps that is all the Proces detach - calling pandoc?

Thanks for the feedback on the post BTW. I added your commentary as an update to the bottom.

May 28 '21 13:05 adamgordonbell

Here is what I got using --cpusampler with kramdown:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 287302 samples with period 1ms.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
  Opt %: Percent of time spent in compiled and therefore non-interpreted code.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Thread: Thread[Ruby Thread id=14 from Ruby-reference-processor,5,main]
 Name                                                                           |      Total Time     |  Opt % ||       Self Time     |  Opt % | Location             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 block in CExt#data_finalizer                                                   |       3621ms   1.3% |  69.2% ||       3565ms   1.2% |  69.1% | ../../../.rbenv/versions/truffleruby-dev/lib/truffle/truffle/cext.rb~1452-1454:33041-33164 
 block in CExt#data_marker                                                      |         89ms   0.0% |   0.0% ||         85ms   0.0% |   0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/truffle/truffle/cext.rb~1460-1464:33257-33449 
 yaml_free                                                                      |         24ms   0.0% | 100.0% ||         24ms   0.0% | 100.0% | api.c~51:0 
 yaml_parser_delete                                                             |         45ms   0.0% |  80.0% ||         18ms   0.0% |  72.2% | api.c~215:0 
 ruby_xfree                                                                     |          6ms   0.0% |  83.3% ||          6ms   0.0% |  83.3% | alloc.c~60:0 
 dealloc                                                                        |         55ms   0.0% |  74.5% ||          4ms   0.0% |  50.0% | psych_parser.c~43:0 
 yaml_token_delete                                                              |          3ms   0.0% |  33.3% ||          3ms   0.0% |  33.3% | api.c~579:0 
 rb_gc_mark                                                                     |          2ms   0.0% |   0.0% ||          2ms   0.0% |   0.0% | gc.c~21:0 
 tokenizer_mark                                                                 |          4ms   0.0% |   0.0% ||          2ms   0.0% |   0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-c-4.0.0/ext/liquid_c/tokenizer.c~6:69-105 
 tokenizer_free                                                                 |          1ms   0.0% |   0.0% ||          1ms   0.0% |   0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-c-4.0.0/ext/liquid_c/tokenizer.c~12:181-217 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

May 28 '21 14:05 adamgordonbell

Thanks, that warning is just a performance warning, it shouldn't affect any behavior. It comes from https://github.com/gettalong/kramdown/blob/REL_2_3_1/lib/kramdown/parser/kramdown.rb#L201 So apparently that code is hot enough to compile, there are many different Regexps created, and that's pretty expensive.

The --cpusampler output above is only for the Ruby-reference-processor Thread (which runs finalizers, etc). I think the actual work happens on the main thread here (Thread[main,N,main]).

May 28 '21 14:05 eregon

results.txt Here is the full thing from my run.

May 28 '21 14:05 adamgordonbell

Kernel.sleep                                                                   |      13768ms   4.8%

Lol

May 28 '21 15:05 chrisseaton

Thanks, it's somewhat similar to the profile I posted above:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Thread: Thread[main,5,main]
 Name                                                                           |      Total Time     |  Opt % ||       Self Time     |  Opt % | Location             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Thread#join                                                                    |     263063ms  91.5% |   0.0% ||     263063ms  91.5% |   0.0% | (core)~1:0 
 Truffle::RegexpOperations.match_in_region                                      |        881ms   0.3% |  98.0% ||        799ms   0.3% |  97.7% | resource:/truffleruby/core/truffle/regexp_operations.rb~67-75:2236-2687 
 Kernel#gem_original_require                                                    |       3057ms   1.1% |  34.4% ||        489ms   0.2% |   9.6% | resource:/truffleruby/core/kernel.rb~226-238:7092-7472 
 Truffle::KernelOperations.load                                                 |     285483ms  99.3% |   0.0% ||        451ms   0.2% |   0.0% | (core)~1:0 
 Jekyll::Cache#getset                                                           |       7269ms   2.5% |   8.6% ||        451ms   0.2% |  33.5% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/jekyll-4.2.0/lib/jekyll/cache.rb~108-114:2611-2734 
 block in FileUtils#mkdir_p                                                     |        656ms   0.2% |  19.8% ||        419ms   0.1% |  17.4% | ../../../.rbenv/versions/truffleruby-dev/lib/mri/fileutils.rb~211-226:6392-6885 
 Liquid::Context#find_variable                                                  |       1275ms   0.4% |  40.9% ||        324ms   0.1% |  40.7% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/context.rb~163-190:4673-5704 
 block in Liquid::VariableLookup#evaluate                                       |       1585ms   0.6% |  51.7% ||        303ms   0.1% |  38.3% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/variable_lookup.rb~38-66:853-2237 
 Hash#key?                                                                      |        281ms   0.1% |  26.7% ||        232ms   0.1% |  22.0% | resource:/truffleruby/core/hash.rb~429-431:10943-11033 
 Truffle::StringOperations.gsub_internal_core                                   |        902ms   0.3% |  61.6% ||        223ms   0.1% |  35.0% | resource:/truffleruby/core/truffle/string_operations.rb~70-124:1949-3806 
 yaml_parser_update_buffer                                                      |        228ms   0.1% |  22.8% ||        221ms   0.1% |  21.3% | reader.c~142:0 
 Liquid::Context#lookup_and_evaluate                                            |       1100ms   0.4% |  49.9% ||        214ms   0.1% |  32.7% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/context.rb~192-204:5707-6131 
 yaml_parser_scan_plain_scalar                                                  |        358ms   0.1% |   1.1% ||        201ms   0.1% |   1.0% | scanner.c~3389:0 
 Truffle::StringOperations.byte_index                                           |        197ms   0.1% |  96.4% ||        197ms   0.1% |  96.4% | resource:/truffleruby/core/truffle/string_operations.rb~273-285:8643-9047 
 FFI::Function#call                                                             |        202ms   0.1% |   0.0% ||        197ms   0.1% |   0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/truffle/truffle/ffi_backend/function.rb~68-93:2961-3831 
 StringScanner#scan_internal                                                    |        697ms   0.2% |  65.3% ||        179ms   0.1% |  20.7% | ../../../.rbenv/versions/truffleruby-dev/lib/truffle/strscan.rb~311-329:6967-7537 
 yaml_parser_scan_flow_scalar                                                   |        244ms   0.1% |   1.2% ||        175ms   0.1% |   0.6% | scanner.c~3016:0 
 Truffle::POSIX.open                                                            |        178ms   0.1% |  76.4% ||        171ms   0.1% |  76.6% | resource:/truffleruby/core/posix.rb~134-168:3839-5070 
 Truffle::Type.check_funcall                                                    |        498ms   0.2% |   6.6% ||        164ms   0.1% |   7.3% | resource:/truffleruby/core/type.rb~296-313:10151-10667 
 Jekyll::Drops::Drop#[]                                                         |        824ms   0.3% |  44.4% ||        159ms   0.1% |  32.7% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/jekyll-4.2.0/lib/jekyll/drops/drop.rb~124-132:4241-4466 
 Kernel#require_relative                                                        |       1406ms   0.5% |   0.0% ||        147ms   0.1% |   0.0% | resource:/truffleruby/core/kernel.rb~283-297:9145-9652 
 Liquid::Context#evaluate                                                       |       3107ms   1.1% |  62.4% ||        144ms   0.1% |  52.8% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/context.rb~158-160:4486-4586 
 Truffle::FeatureLoader.find_file                                               |        145ms   0.1% |  29.0% ||        132ms   0.0% |  31.1% | resource:/truffleruby/core/truffle/feature_loader.rb~75-78:2358-2501 
 parse                                                                          |       2231ms   0.8% |   0.6% ||        128ms   0.0% |   0.0% | psych_parser.c~253:0 
 Liquid::VariableLookup#evaluate                                                |       3009ms   1.0% |  60.4% ||        123ms   0.0% |  60.2% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/variable_lookup.rb~34-70:746-2269 
 Truffle::Splitter.split                                                        |        263ms   0.1% |  58.9% ||        118ms   0.0% |  77.1% | resource:/truffleruby/core/splitter.rb~42-91:2000-3846 
 Hash#default                                                                   |        178ms   0.1% |   7.9% ||        118ms   0.0% |  11.0% | resource:/truffleruby/core/hash.rb~187-193:5746-5922 
 String#gsub                                                                    |        784ms   0.3% |  44.5% ||        116ms   0.0% |  10.3% | resource:/truffleruby/core/string.rb~938-951:24494-25038 
 Liquid::BlockBody#render                                                       |       9667ms   3.4% |  84.1% ||        114ms   0.0% |  67.5% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/block_body.rb~69-98:2125-3145 
 Truffle::Type.object_respond_to_no_built_in?                                   |        108ms   0.0% |   0.9% ||        108ms   0.0% |   0.9% | resource:/truffleruby/core/type.rb~65-68:3461-3664 
 rb_ary_store                                                                   |        127ms   0.0% |  29.9% ||        106ms   0.0% |  30.2% | array.c~82:0 
 Truffle::Type.check_funcall_missing                                            |        213ms   0.1% |   8.9% ||        106ms   0.0% |  12.3% | resource:/truffleruby/core/type.rb~328-352:10969-11713 
 Truffle::POSIX.write                                                           |        106ms   0.0% |   0.0% ||        105ms   0.0% |   0.0% | resource:/truffleruby/core/posix.rb~134-168:3839-5070 
 block in Array#join                                                            |        192ms   0.1% |  43.2% ||        104ms   0.0% |  41.3% | resource:/truffleruby/core/array.rb~591-612:14164-14693 
 Liquid::BlockBody#render_node_to_output                                        |       9642ms   3.4% |  74.4% ||        102ms   0.0% |  55.9% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/block_body.rb~102-115:3161-3787 
 Enumerable#find_index                                                          |        411ms   0.1% |  38.7% ||         98ms   0.0% |  44.9% | resource:/truffleruby/core/enumerable.rb~663-683:15960-16388 
 Truffle::POSIX.mkdir                                                           |        105ms   0.0% |  11.4% ||         95ms   0.0% |  10.5% | resource:/truffleruby/core/posix.rb~134-168:3839-5070 
 Array#flatten!                                                                 |        441ms   0.2% |   9.5% ||         93ms   0.0% |   5.4% | resource:/truffleruby/core/array.rb~453-466:10552-10871 
 Truffle::POSIX.truffleposix_stat_mode                                          |        104ms   0.0% |  81.7% ||         92ms   0.0% |  83.7% | resource:/truffleruby/core/posix.rb~134-168:3839-5070 
 block in Kramdown::Parser::GFM#parse_list                                      |        132ms   0.0% |   0.0% ||         91ms   0.0% |   0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/kramdown-parser-gfm-1.1.0/lib/kramdown/parser/gfm.rb~165:5363-5459

It's interesting that Truffle::RegexpOperations.match_in_region seems the hottest there. Unfortunately 91.5% is spent in Thread#join, I guess waiting for filesystem changes from the 2 listen Threads, so those numbers are all very small and hard to compare.

@chrisseaton That Kernel#sleep is from the listen Thread it seems.

May 28 '21 15:05 eregon

I'd like to look more this, but realistically I think we need a way to render the website (or a page) repetitively without any pause in between, otherwise it's very difficult to profile. If anyone wants to look into that, i.e., a script rendering the website or some page repeatedly, that would be helpful.

jekyll build is probably not a great use case for JIT if it executes in a few seconds anyway, but for a bigger website it could pay off. For the rebuilding server use case like here it should pay off to run with a Ruby JIT, if not restarted too often (e.g. due to _config.yml changes).

Jun 04 '21 16:06 eregon

Hi @eregon, The YJIT Ruby folks converted my example to a proper benchmark here: https://github.com/Shopify/yjit-bench/pull/21

Jun 08 '21 18:06 adamgordonbell

truffleruby truffleruby copied to clipboard

Jekyll Performance slower than MRI after warm up

truffleruby
truffleruby copied to clipboard