truffleruby
truffleruby copied to clipboard
Jekyll Performance slower than MRI after warm up
Hi Using Jekyll, and keeping the process running so truffle has lots of warm-up time I found that MRI Ruby is faster. After 20x builds it is getting closer but MRI Ruby is at 0.5 seconds and Truffle --jvm is at ~0.7 and Truffle --native is ~0.9s. Perhaps that is within expectations, I'm not sure, but I put together a repo to demonstrate.
Here is my test repo
Thank you for the report. I still need to read the related blog post (I just skimmed it so far).
Regarding profiling, there is some documentation https://github.com/oracle/truffleruby/blob/master/doc/user/tools.md (the easier is just adding --cpusampler). Those docs seem to miss to mention that's it's possible to profile with Chrome Inspector and get a flamechart there.
2.7 | 33.7 s 3.0 | 44.1 s
If that's reproducible it seems worth reporting to CRuby too, that's ~31% slower.
Truffle Ruby didn't work with Kramdown, so pandoc is used for markdown in this test.
Do you have backtrace or output? How can I reproduce that?
What would be really useful here is to have a single Ruby process rendering multiple times, without waiting any time in between, otherwise it gets quite messy to profile, because most of the time is spent sleeping or waiting.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread: Thread[main,5,main]
Name | Total Time | Opt % || Self Time | Opt % | Location
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread#join | 93764ms 94.5% | 0.0% || 93763ms 94.5% | 0.0% | (core)~1:0
FFI::Function#call | 571ms 0.6% | 0.0% || 561ms 0.6% | 0.0% | ../../../.rubies/truffleruby-dev/lib/truffle/truffle/ffi_backend/function.rb~68-93:2961-3831
Kernel#gem_original_require | 1109ms 1.1% | 0.0% || 255ms 0.3% | 0.0% | resource:/truffleruby/core/kernel.rb~226-238:7092-7472
yaml_parser_update_buffer | 186ms 0.2% | 21.5% || 178ms 0.2% | 20.2% | reader.c~142:0
yaml_parser_scan_plain_scalar | 260ms 0.3% | 0.4% || 143ms 0.1% | 0.7% | scanner.c~3389:0
yaml_parser_scan_flow_scalar | 164ms 0.2% | 1.8% || 123ms 0.1% | 1.6% | scanner.c~3016:0
tokenizer_next | 122ms 0.1% | 0.0% || 122ms 0.1% | 0.0% | ../../../.rubies/truffleruby-dev/lib/gems/gems/liquid-c-4.0.0/ext/liquid_c/tokenizer.c~58:1435-1493
Kernel#require_relative | 808ms 0.8% | 0.0% || 87ms 0.1% | 0.0% | resource:/truffleruby/core/kernel.rb~283-297:9145-9652
Marshal::State#construct | 242ms 0.2% | 43.0% || 63ms 0.1% | 3.2% | resource:/truffleruby/core/marshal.rb~561-654:14617-17114
parse | 1194ms 1.2% | 0.0% || 62ms 0.1% | 0.0% | psych_parser.c~253:0
FWIW, I ran manually with:
bundle exec ruby --cpusampler -S jekyll serve --incremental -H 127.0.0.1 -P 4001
(from the test-two-seven directory to make sure I run the same code as CRuby 2.7)
and then in another shell, multiple invocations of:
for i in {1..10}; do echo $i; echo "1\n" >> ./_posts/2009-05-15-edge-case-nested-and-mixed-lists.md; sleep 1; done
I notice a lot of threads created by Process.detach, so probably there are quite a lot of subprocesses.
Process.detach also feels like an anti-pattern, I wonder what uses it.
Thanks for looking into this!
What would be really useful here is to have a single Ruby process rendering multiple times, without waiting any time in between, otherwise it gets quite messy to profile, because most of the time is spent sleeping or waiting.
Yeah, my profiling method could be improved, but I'm not sure about the best way to tackle that. This was just a dirty way to be able to measure render time after warm up.
Do you have backtrace or output? How can I reproduce that?
This is the error I was getting:
[To redirect Truffle log output to a file use one of the following options:
* '--log.file=<path>' if the option is passed using a guest language launcher.
* '-Dpolyglot.log.file=<path>' if the option is passed using the host Java launcher.
* Configure logging using the polyglot embedding API.]
[ruby] WARNING: /Users/adam/.rbenv/versions/truffleruby-dev/lib/gems/gems/kramdown-2.3.1/lib/kramdown/parser/kramdown.rb:223: unstable interpolated regexps are not optimized
And then things were hanging. But I can't reproduce this anymore, now it seems to work fine but occasionally give that warning. So maybe that was a user error. This branch has it switched back to kramdown.
It's possible that kramdown is a better test case because pandoc is not a ruby program and kramdown is. Perhaps that is all the Proces detach - calling pandoc?
Thanks for the feedback on the post BTW. I added your commentary as an update to the bottom.
Here is what I got using --cpusampler with kramdown:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 287302 samples with period 1ms.
Self Time: Time spent on the top of the stack.
Total Time: Time spent somewhere on the stack.
Opt %: Percent of time spent in compiled and therefore non-interpreted code.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread: Thread[Ruby Thread id=14 from Ruby-reference-processor,5,main]
Name | Total Time | Opt % || Self Time | Opt % | Location
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
block in CExt#data_finalizer | 3621ms 1.3% | 69.2% || 3565ms 1.2% | 69.1% | ../../../.rbenv/versions/truffleruby-dev/lib/truffle/truffle/cext.rb~1452-1454:33041-33164
block in CExt#data_marker | 89ms 0.0% | 0.0% || 85ms 0.0% | 0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/truffle/truffle/cext.rb~1460-1464:33257-33449
yaml_free | 24ms 0.0% | 100.0% || 24ms 0.0% | 100.0% | api.c~51:0
yaml_parser_delete | 45ms 0.0% | 80.0% || 18ms 0.0% | 72.2% | api.c~215:0
ruby_xfree | 6ms 0.0% | 83.3% || 6ms 0.0% | 83.3% | alloc.c~60:0
dealloc | 55ms 0.0% | 74.5% || 4ms 0.0% | 50.0% | psych_parser.c~43:0
yaml_token_delete | 3ms 0.0% | 33.3% || 3ms 0.0% | 33.3% | api.c~579:0
rb_gc_mark | 2ms 0.0% | 0.0% || 2ms 0.0% | 0.0% | gc.c~21:0
tokenizer_mark | 4ms 0.0% | 0.0% || 2ms 0.0% | 0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-c-4.0.0/ext/liquid_c/tokenizer.c~6:69-105
tokenizer_free | 1ms 0.0% | 0.0% || 1ms 0.0% | 0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-c-4.0.0/ext/liquid_c/tokenizer.c~12:181-217
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thanks, that warning is just a performance warning, it shouldn't affect any behavior. It comes from https://github.com/gettalong/kramdown/blob/REL_2_3_1/lib/kramdown/parser/kramdown.rb#L201 So apparently that code is hot enough to compile, there are many different Regexps created, and that's pretty expensive.
The --cpusampler output above is only for the Ruby-reference-processor Thread (which runs finalizers, etc).
I think the actual work happens on the main thread here (Thread[main,N,main]).
results.txt Here is the full thing from my run.
Kernel.sleep | 13768ms 4.8%
Lol
Thanks, it's somewhat similar to the profile I posted above:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread: Thread[main,5,main]
Name | Total Time | Opt % || Self Time | Opt % | Location
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread#join | 263063ms 91.5% | 0.0% || 263063ms 91.5% | 0.0% | (core)~1:0
Truffle::RegexpOperations.match_in_region | 881ms 0.3% | 98.0% || 799ms 0.3% | 97.7% | resource:/truffleruby/core/truffle/regexp_operations.rb~67-75:2236-2687
Kernel#gem_original_require | 3057ms 1.1% | 34.4% || 489ms 0.2% | 9.6% | resource:/truffleruby/core/kernel.rb~226-238:7092-7472
Truffle::KernelOperations.load | 285483ms 99.3% | 0.0% || 451ms 0.2% | 0.0% | (core)~1:0
Jekyll::Cache#getset | 7269ms 2.5% | 8.6% || 451ms 0.2% | 33.5% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/jekyll-4.2.0/lib/jekyll/cache.rb~108-114:2611-2734
block in FileUtils#mkdir_p | 656ms 0.2% | 19.8% || 419ms 0.1% | 17.4% | ../../../.rbenv/versions/truffleruby-dev/lib/mri/fileutils.rb~211-226:6392-6885
Liquid::Context#find_variable | 1275ms 0.4% | 40.9% || 324ms 0.1% | 40.7% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/context.rb~163-190:4673-5704
block in Liquid::VariableLookup#evaluate | 1585ms 0.6% | 51.7% || 303ms 0.1% | 38.3% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/variable_lookup.rb~38-66:853-2237
Hash#key? | 281ms 0.1% | 26.7% || 232ms 0.1% | 22.0% | resource:/truffleruby/core/hash.rb~429-431:10943-11033
Truffle::StringOperations.gsub_internal_core | 902ms 0.3% | 61.6% || 223ms 0.1% | 35.0% | resource:/truffleruby/core/truffle/string_operations.rb~70-124:1949-3806
yaml_parser_update_buffer | 228ms 0.1% | 22.8% || 221ms 0.1% | 21.3% | reader.c~142:0
Liquid::Context#lookup_and_evaluate | 1100ms 0.4% | 49.9% || 214ms 0.1% | 32.7% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/context.rb~192-204:5707-6131
yaml_parser_scan_plain_scalar | 358ms 0.1% | 1.1% || 201ms 0.1% | 1.0% | scanner.c~3389:0
Truffle::StringOperations.byte_index | 197ms 0.1% | 96.4% || 197ms 0.1% | 96.4% | resource:/truffleruby/core/truffle/string_operations.rb~273-285:8643-9047
FFI::Function#call | 202ms 0.1% | 0.0% || 197ms 0.1% | 0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/truffle/truffle/ffi_backend/function.rb~68-93:2961-3831
StringScanner#scan_internal | 697ms 0.2% | 65.3% || 179ms 0.1% | 20.7% | ../../../.rbenv/versions/truffleruby-dev/lib/truffle/strscan.rb~311-329:6967-7537
yaml_parser_scan_flow_scalar | 244ms 0.1% | 1.2% || 175ms 0.1% | 0.6% | scanner.c~3016:0
Truffle::POSIX.open | 178ms 0.1% | 76.4% || 171ms 0.1% | 76.6% | resource:/truffleruby/core/posix.rb~134-168:3839-5070
Truffle::Type.check_funcall | 498ms 0.2% | 6.6% || 164ms 0.1% | 7.3% | resource:/truffleruby/core/type.rb~296-313:10151-10667
Jekyll::Drops::Drop#[] | 824ms 0.3% | 44.4% || 159ms 0.1% | 32.7% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/jekyll-4.2.0/lib/jekyll/drops/drop.rb~124-132:4241-4466
Kernel#require_relative | 1406ms 0.5% | 0.0% || 147ms 0.1% | 0.0% | resource:/truffleruby/core/kernel.rb~283-297:9145-9652
Liquid::Context#evaluate | 3107ms 1.1% | 62.4% || 144ms 0.1% | 52.8% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/context.rb~158-160:4486-4586
Truffle::FeatureLoader.find_file | 145ms 0.1% | 29.0% || 132ms 0.0% | 31.1% | resource:/truffleruby/core/truffle/feature_loader.rb~75-78:2358-2501
parse | 2231ms 0.8% | 0.6% || 128ms 0.0% | 0.0% | psych_parser.c~253:0
Liquid::VariableLookup#evaluate | 3009ms 1.0% | 60.4% || 123ms 0.0% | 60.2% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/variable_lookup.rb~34-70:746-2269
Truffle::Splitter.split | 263ms 0.1% | 58.9% || 118ms 0.0% | 77.1% | resource:/truffleruby/core/splitter.rb~42-91:2000-3846
Hash#default | 178ms 0.1% | 7.9% || 118ms 0.0% | 11.0% | resource:/truffleruby/core/hash.rb~187-193:5746-5922
String#gsub | 784ms 0.3% | 44.5% || 116ms 0.0% | 10.3% | resource:/truffleruby/core/string.rb~938-951:24494-25038
Liquid::BlockBody#render | 9667ms 3.4% | 84.1% || 114ms 0.0% | 67.5% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/block_body.rb~69-98:2125-3145
Truffle::Type.object_respond_to_no_built_in? | 108ms 0.0% | 0.9% || 108ms 0.0% | 0.9% | resource:/truffleruby/core/type.rb~65-68:3461-3664
rb_ary_store | 127ms 0.0% | 29.9% || 106ms 0.0% | 30.2% | array.c~82:0
Truffle::Type.check_funcall_missing | 213ms 0.1% | 8.9% || 106ms 0.0% | 12.3% | resource:/truffleruby/core/type.rb~328-352:10969-11713
Truffle::POSIX.write | 106ms 0.0% | 0.0% || 105ms 0.0% | 0.0% | resource:/truffleruby/core/posix.rb~134-168:3839-5070
block in Array#join | 192ms 0.1% | 43.2% || 104ms 0.0% | 41.3% | resource:/truffleruby/core/array.rb~591-612:14164-14693
Liquid::BlockBody#render_node_to_output | 9642ms 3.4% | 74.4% || 102ms 0.0% | 55.9% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/liquid-4.0.3/lib/liquid/block_body.rb~102-115:3161-3787
Enumerable#find_index | 411ms 0.1% | 38.7% || 98ms 0.0% | 44.9% | resource:/truffleruby/core/enumerable.rb~663-683:15960-16388
Truffle::POSIX.mkdir | 105ms 0.0% | 11.4% || 95ms 0.0% | 10.5% | resource:/truffleruby/core/posix.rb~134-168:3839-5070
Array#flatten! | 441ms 0.2% | 9.5% || 93ms 0.0% | 5.4% | resource:/truffleruby/core/array.rb~453-466:10552-10871
Truffle::POSIX.truffleposix_stat_mode | 104ms 0.0% | 81.7% || 92ms 0.0% | 83.7% | resource:/truffleruby/core/posix.rb~134-168:3839-5070
block in Kramdown::Parser::GFM#parse_list | 132ms 0.0% | 0.0% || 91ms 0.0% | 0.0% | ../../../.rbenv/versions/truffleruby-dev/lib/gems/gems/kramdown-parser-gfm-1.1.0/lib/kramdown/parser/gfm.rb~165:5363-5459
It's interesting that Truffle::RegexpOperations.match_in_region seems the hottest there.
Unfortunately 91.5% is spent in Thread#join, I guess waiting for filesystem changes from the 2 listen Threads, so those numbers are all very small and hard to compare.
@chrisseaton That Kernel#sleep is from the listen Thread it seems.
I'd like to look more this, but realistically I think we need a way to render the website (or a page) repetitively without any pause in between, otherwise it's very difficult to profile. If anyone wants to look into that, i.e., a script rendering the website or some page repeatedly, that would be helpful.
jekyll build is probably not a great use case for JIT if it executes in a few seconds anyway, but for a bigger website it could pay off. For the rebuilding server use case like here it should pay off to run with a Ruby JIT, if not restarted too often (e.g. due to _config.yml changes).
Hi @eregon, The YJIT Ruby folks converted my example to a proper benchmark here: https://github.com/Shopify/yjit-bench/pull/21