jruby
jruby copied to clipboard
Slow performance on YJIT's HexaPDF benchmark
While investigating some HexaPDF test failures, I also played around with the benchmark used by the YJIT team:
https://github.com/Shopify/yjit-bench/tree/main/benchmarks/hexapdf
The performance on JRuby is unusually slow, so I'm guessing something is wrong:
#22: 2017ms
#23: 1975ms
#24: 1969ms
#25: 2056ms
Compared to Ruby 3.4:
#22: 1506ms
#23: 1499ms
#24: 1499ms
#25: 1490ms
And with YJIT:
#22: 762ms
#23: 833ms
#24: 750ms
#25: 833ms
Interestingly, JRuby has a lower RSS than either run of CRuby (679MiB vs 797MiB and 966MiB), so that's good, but we're clearly missing some optimizations that CRuby has added.
Looks like it's using refinements for some numeric operations, which is doubly bad because it defeats our other numeric optimizations and refinements are totally unoptimized at present.
A trace from JMC:
And the offending refinement in HexaPDF:
https://github.com/gettalong/hexapdf/blob/6cd6dccbc885a5e0c13b060af2a726b20325e727/lib/hexapdf/layout/numeric_refinements.rb#L43
Refinement of Numeric is likely the largest problem here, since it breaks almost all of our optimizations for numeric operations.
Replacing the refinement with a hard monkey patch eliminates most of the numeric overhead, with numbers approaching 1.3s at times.
The new culprit is millions of calls to this width method that uses a block to sum up item sizes based on the current style (I think):
https://github.com/gettalong/hexapdf/blob/6cd6dccbc885a5e0c13b060af2a726b20325e727/lib/hexapdf/layout/text_fragment.rb#L329-L331
Here the problem is that this code is called extremely frequently and creates a zero-width DynamicScope every time, even though the value may already have been calculated. Moving the RHS to a separate method, so it does not repeatedly require a caller scope, does appear to help, but other methods with heavy block use seem to rise to the top.
The top two methods in a JMC profile appear to be two blocks.
Here in TextLayouter.call:
https://github.com/gettalong/hexapdf/blob/6cd6dccbc885a5e0c13b060af2a726b20325e727/lib/hexapdf/layout/text_layouter.rb#L231
and here in TextLayouter#fit:
https://github.com/gettalong/hexapdf/blob/6cd6dccbc885a5e0c13b060af2a726b20325e727/lib/hexapdf/layout/text_layouter.rb#L746
They are quite big so it will require more digging to figure out what's slower than it should be.
On a memory profile at this point, the number one allocation is RubyFixnum, in response to a loop increment in the call method mentioned above and in response to hash calls here:
https://github.com/gettalong/hexapdf/blob/6cd6dccbc885a5e0c13b060af2a726b20325e727/lib/hexapdf/layout/text_fragment.rb#L194
Strangely, running with -X+C causes a NoMethodError where there should not be one:
$ jruby -X+C benchmark.rb
jruby 10.0.1.0-SNAPSHOT (3.4.2) 2025-07-07 7c668aba88 OpenJDK 64-Bit Server VM 21.0.5+11-LTS on 21.0.5+11-LTS +indy +jit [arm64-darwin]
Command: bundle check 2> /dev/null || bundle install
The Gemfile's dependencies are satisfied
NoMethodError: undefined method 'flag' for an instance of HexaPDF::Dictionary
create_pdf_object at /Users/headius/work/jruby/lib/ruby/gems/shared/gems/hexapdf-0.16.0/lib/hexapdf/font/type1_wrapper.rb:227
initialize at /Users/headius/work/jruby/lib/ruby/gems/shared/gems/hexapdf-0.16.0/lib/hexapdf/font/type1_wrapper.rb:121
...
Bug was introduced at 4b897eec13b4d468ee438885d81ddba16d02aebc but is hidden at boot because Hexpdf has a rescue nil that captures too much.
Unhandled Java exception: java.lang.ArrayIndexOutOfBoundsException: Index 17 out of bounds for length 17
java.lang.ArrayIndexOutOfBoundsException: Index 17 out of bounds for length 17
buildString at org/jruby/ir/targets/indy/BuildDynamicStringSite.java:507
️❤ def bit_field #0 at Users/headius/work/jruby/lib/ruby/gems/shared/gems/hexapdf_minus_0_dot_16_dot_0/lib/hexapdf/utils//Users/headius/work/jruby/lib/ruby/gems/shared/gems/hexapdf-0.16.0/lib/hexapdf/utils/bit_field.rb:75
I'm looking into it.
The flag bug is fixed in #8896.