truffleruby icon indicating copy to clipboard operation
truffleruby copied to clipboard

Fast blank is slower than Slow blank on Truffleruby

Open gogainda opened this issue 4 years ago • 10 comments

https://github.com/SamSaffron/fast_blank

================== Test String Length: 0 ==================
Warming up --------------------------------------
          Fast Blank    93.707k i/100ms
  Fast ActiveSupport   510.726k i/100ms
          Slow Blank    76.211k i/100ms
      New Slow Blank    22.244M i/100ms
Calculating -------------------------------------
          Fast Blank      5.361M (± 5.0%) i/s -     26.706M in   4.997583s
  Fast ActiveSupport      5.451M (± 2.9%) i/s -     27.579M in   5.063809s
          Slow Blank      1.297M (±66.5%) i/s -      3.125M in   5.072733s
      New Slow Blank    220.498M (± 8.1%) i/s -      1.090B in   5.007588s

Comparison:
      New Slow Blank: 220497717.1 i/s
  Fast ActiveSupport:  5451195.9 i/s - 40.45x  (± 0.00) slower
          Fast Blank:  5360771.1 i/s - 41.13x  (± 0.00) slower
          Slow Blank:  1296628.6 i/s - 170.05x  (± 0.00) slower


================== Test String Length: 6 ==================
Warming up --------------------------------------
          Fast Blank   168.425k i/100ms
  Fast ActiveSupport   183.225k i/100ms
          Slow Blank    78.492k i/100ms
      New Slow Blank     2.231M i/100ms
Calculating -------------------------------------
          Fast Blank      1.866M (± 3.8%) i/s -      9.432M in   5.063119s
  Fast ActiveSupport      1.847M (± 3.5%) i/s -      9.344M in   5.066771s
          Slow Blank      1.283M (±66.5%) i/s -      3.061M in   5.141647s
      New Slow Blank     20.021M (±12.9%) i/s -     98.156M in   5.011524s

Comparison:
      New Slow Blank: 20021393.6 i/s
          Fast Blank:  1865661.2 i/s - 10.73x  (± 0.00) slower
  Fast ActiveSupport:  1846637.6 i/s - 10.84x  (± 0.00) slower
          Slow Blank:  1282619.8 i/s - 15.61x  (± 0.00) slower


================== Test String Length: 14 ==================
Warming up --------------------------------------
          Fast Blank   383.840k i/100ms
  Fast ActiveSupport   396.259k i/100ms
          Slow Blank     2.476M i/100ms
      New Slow Blank    83.883k i/100ms
Calculating -------------------------------------
          Fast Blank      3.711M (± 6.8%) i/s -     18.808M in   5.097805s
  Fast ActiveSupport      3.893M (± 4.5%) i/s -     19.813M in   5.100463s
          Slow Blank     24.881M (± 5.0%) i/s -    126.293M in   5.090137s
      New Slow Blank      1.221M (±68.5%) i/s -      3.188M in   5.196091s

Comparison:
          Slow Blank: 24880544.8 i/s
  Fast ActiveSupport:  3892783.7 i/s - 6.39x  (± 0.00) slower
          Fast Blank:  3711304.1 i/s - 6.70x  (± 0.00) slower
      New Slow Blank:  1220939.0 i/s - 20.38x  (± 0.00) slower


================== Test String Length: 24 ==================
Warming up --------------------------------------
          Fast Blank   127.566k i/100ms
  Fast ActiveSupport   211.474k i/100ms
          Slow Blank     2.129M i/100ms
      New Slow Blank    55.351k i/100ms
Calculating -------------------------------------
          Fast Blank      2.145M (± 8.9%) i/s -     10.588M in   5.000647s
  Fast ActiveSupport      2.077M (± 6.3%) i/s -     10.362M in   5.010825s
          Slow Blank     19.853M (± 5.5%) i/s -    100.044M in   5.057535s
      New Slow Blank      1.337M (±61.4%) i/s -      3.100M in   5.086929s

Comparison:
          Slow Blank: 19852906.7 i/s
          Fast Blank:  2144579.7 i/s - 9.26x  (± 0.00) slower
  Fast ActiveSupport:  2076706.4 i/s - 9.56x  (± 0.00) slower
      New Slow Blank:  1337390.2 i/s - 14.84x  (± 0.00) slower


================== Test String Length: 136 ==================
Warming up --------------------------------------
          Fast Blank    68.189k i/100ms
  Fast ActiveSupport   111.426k i/100ms
          Slow Blank     1.927M i/100ms
      New Slow Blank    72.075k i/100ms
Calculating -------------------------------------
          Fast Blank      1.280M (± 8.8%) i/s -      6.342M in   5.019817s
  Fast ActiveSupport      1.113M (± 3.3%) i/s -      5.571M in   5.011350s
          Slow Blank     19.025M (± 2.4%) i/s -     96.352M in   5.067432s
      New Slow Blank      1.098M (±59.3%) i/s -      3.027M in   5.149704s

Comparison:
          Slow Blank: 19024883.4 i/s
          Fast Blank:  1279946.1 i/s - 14.86x  (± 0.00) slower
  Fast ActiveSupport:  1112982.8 i/s - 17.09x  (± 0.00) slower
      New Slow Blank:  1098161.1 i/s - 17.32x  (± 0.00) slower

@SamSaffron

gogainda avatar Sep 05 '21 21:09 gogainda

A minimal thing we can do with the gem @eregon is have a special switch of "if truffle, do no harm" happy for a PR in that department if the overhead here can not be removed.

SamSaffron avatar Sep 06 '21 01:09 SamSaffron

The benchmark is from https://github.com/SamSaffron/fast_blank/blob/master/benchmark The first 2 (lengths 0 and 6) are actually blank, the last 3 are not.

For length 0 & 6 (actually blank), New Slow Blank is fastest (expected) and Slow Blank is slowest (unexpected). For length 14-136 (not blank), Slow Blank seems fastest, and New Slow Blank is actually slowest (unexpected).

Note that latest ActiveSupport actually does something different a mix of Slow Blank and New Slow Blank + match?: https://github.com/rails/rails/blob/78e402077d4d69b08db7daf8944a32eef0cf6ac4/activesupport/lib/active_support/core_ext/object/blank.rb#L104-L127

BLANK_RE = /\A[[:space:]]*\z/
BLANK_RE.match?(self)

(TruffleRuby doesn't optimize match? specifically currently, maybe it should)

It would be worth measuring with what ActiveSupport does.

In general it's kind of expected the C extension is slower here as it does RSTRING_PTR (forces the string to native memory) + rb_enc_codepoint_len (likely not super fast to iterate the string), and OTOH Regexp are now just-in-time compiled on TruffleRuby with TRegex. So probably we should indeed skip the C extension.

But I think we should figure out if what is used in ActiveSupport is faster for all cases, and ensure it's as fast or faster than CRuby in all cases.

eregon avatar Sep 06 '21 13:09 eregon

this is what I've got:

================== Test String Length: 0 ==================
Warming up --------------------------------------
          Fast Blank    60.278k i/100ms
  Fast ActiveSupport   487.533k i/100ms
          Slow Blank    69.425k i/100ms
      New Slow Blank    23.118M i/100ms
     Latest AR Blank   207.506M i/100ms
Calculating -------------------------------------
          Fast Blank      5.409M (± 8.0%) i/s -     26.824M in   4.999122s
  Fast ActiveSupport      5.410M (± 4.0%) i/s -     27.302M in   5.054813s
          Slow Blank      1.279M (±61.7%) i/s -      2.846M in   5.077555s
      New Slow Blank    224.580M (± 9.7%) i/s -      1.110B in   5.025978s
     Latest AR Blank      2.053B (± 5.0%) i/s -     10.375B in   5.067722s

Comparison:
     Latest AR Blank: 2053059733.4 i/s
      New Slow Blank: 224580213.4 i/s - 9.14x  (± 0.00) slower
  Fast ActiveSupport:  5410117.6 i/s - 379.49x  (± 0.00) slower
          Fast Blank:  5408563.2 i/s - 379.59x  (± 0.00) slower
          Slow Blank:  1279073.0 i/s - 1605.12x  (± 0.00) slower


================== Test String Length: 6 ==================
Warming up --------------------------------------
          Fast Blank   171.133k i/100ms
  Fast ActiveSupport   190.379k i/100ms
          Slow Blank    72.840k i/100ms
      New Slow Blank     2.293M i/100ms
     Latest AR Blank    64.434k i/100ms
Calculating -------------------------------------
          Fast Blank      1.739M (±15.9%) i/s -      8.386M in   5.046484s
  Fast ActiveSupport      1.925M (± 3.9%) i/s -      9.709M in   5.051623s
          Slow Blank      1.226M (±59.6%) i/s -      3.059M in   5.227664s
      New Slow Blank     18.898M (±14.1%) i/s -     94.021M in   5.091155s
     Latest AR Blank      1.560M (±58.7%) i/s -      3.351M in   5.010152s

Comparison:
      New Slow Blank: 18898243.6 i/s
  Fast ActiveSupport:  1924981.5 i/s - 9.82x  (± 0.00) slower
          Fast Blank:  1738948.3 i/s - 10.87x  (± 0.00) slower
     Latest AR Blank:  1559954.3 i/s - 12.11x  (± 0.00) slower
          Slow Blank:  1226034.3 i/s - 15.41x  (± 0.00) slower


================== Test String Length: 14 ==================
Warming up --------------------------------------
          Fast Blank   170.897k i/100ms
  Fast ActiveSupport   368.944k i/100ms
          Slow Blank     2.523M i/100ms
      New Slow Blank    75.363k i/100ms
     Latest AR Blank     2.609M i/100ms
Calculating -------------------------------------
          Fast Blank      3.805M (± 7.2%) i/s -     18.970M in   5.014839s
  Fast ActiveSupport      3.765M (± 5.3%) i/s -     18.816M in   5.013121s
          Slow Blank     23.542M (± 3.2%) i/s -    118.585M in   5.043153s
      New Slow Blank      1.217M (±68.0%) i/s -      2.939M in   5.006626s
     Latest AR Blank     22.963M (±13.8%) i/s -    112.194M in   5.013917s

Comparison:
          Slow Blank: 23542338.1 i/s
     Latest AR Blank: 22963001.3 i/s - same-ish: difference falls within error
          Fast Blank:  3804589.0 i/s - 6.19x  (± 0.00) slower
  Fast ActiveSupport:  3764823.5 i/s - 6.25x  (± 0.00) slower
      New Slow Blank:  1217249.4 i/s - 19.34x  (± 0.00) slower


================== Test String Length: 24 ==================
Warming up --------------------------------------
          Fast Blank   199.905k i/100ms
  Fast ActiveSupport   212.155k i/100ms
          Slow Blank     2.228M i/100ms
      New Slow Blank    54.231k i/100ms
     Latest AR Blank     2.375M i/100ms
Calculating -------------------------------------
          Fast Blank      2.141M (± 4.9%) i/s -     10.795M in   5.054385s
  Fast ActiveSupport      2.148M (± 4.2%) i/s -     10.820M in   5.046501s
          Slow Blank     19.710M (± 3.4%) i/s -    100.279M in   5.093634s
      New Slow Blank      1.271M (±58.1%) i/s -      2.983M in   5.021893s
     Latest AR Blank     21.529M (±10.2%) i/s -    106.857M in   5.043917s

Comparison:
     Latest AR Blank: 21528759.2 i/s
          Slow Blank: 19710353.1 i/s - same-ish: difference falls within error
  Fast ActiveSupport:  2147989.0 i/s - 10.02x  (± 0.00) slower
          Fast Blank:  2141145.7 i/s - 10.05x  (± 0.00) slower
      New Slow Blank:  1270581.8 i/s - 16.94x  (± 0.00) slower


================== Test String Length: 136 ==================
Warming up --------------------------------------
          Fast Blank   131.289k i/100ms
  Fast ActiveSupport   127.899k i/100ms
          Slow Blank     1.882M i/100ms
      New Slow Blank    87.353k i/100ms
     Latest AR Blank     1.958M i/100ms
Calculating -------------------------------------
          Fast Blank      1.308M (± 3.3%) i/s -      6.564M in   5.024930s
  Fast ActiveSupport      1.251M (± 5.4%) i/s -      6.267M in   5.024276s
          Slow Blank     18.412M (± 4.8%) i/s -     92.209M in   5.020928s
      New Slow Blank      1.021M (±67.0%) i/s -      2.795M in   5.207333s
     Latest AR Blank     19.142M (±10.9%) i/s -     95.918M in   5.091316s

Comparison:
     Latest AR Blank: 19142242.3 i/s
          Slow Blank: 18411568.1 i/s - same-ish: difference falls within error
          Fast Blank:  1307847.7 i/s - 14.64x  (± 0.00) slower
  Fast ActiveSupport:  1251278.8 i/s - 15.30x  (± 0.00) slower
      New Slow Blank:  1021134.0 i/s - 18.75x  (± 0.00) slower

benchmark

#!/usr/bin/env ruby
$: << File.dirname(__FILE__)+'/lib'
require 'benchmark/ips'
require 'fast_blank'

class String
  # active support implementation
  BLANK_RE = /\A[[:space:]]*\z/
  def slow_blank?
    /\A[[:space:]]*\z/ === self
  end

  def new_slow_blank?
    empty? || !(/[[:^space:]]/ === self)
  end

  def latest_ar_blank?
    empty? || BLANK_RE.match?(self)
  end
end

test_strings = [
  "",
  "\r\n\r\n  ",
  "this is a test",
  "   this is a longer test",
  "   this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test"
]

test_strings.each do |s|
  raise "failed on #{s.inspect}" if s.blank? != s.slow_blank?
  raise "failed on #{s.inspect}" if s.blank? != s.new_slow_blank?
end

test_strings.each do |s|
  puts "\n================== Test String Length: #{s.length} =================="
  Benchmark.ips do |x|
    x.report("Fast Blank") do |times|
      i = 0
      while i < times
        s.blank?
        i += 1
      end
    end

    x.report("Fast ActiveSupport") do |times|
      i = 0
      while i < times
        s.blank_as?
        i += 1
      end
    end

    x.report("Slow Blank") do |times|
      i = 0
      while i < times
        s.slow_blank?
        i += 1
      end
    end

    x.report("New Slow Blank") do |times|
      i = 0
      while i < times
        s.new_slow_blank?
        i += 1
      end
    end
    
    x.report("Latest AR Blank") do |times|
      i = 0
      while i < times
        s.latest_ar_blank?
        i += 1
      end
    end

    x.compare!
  end
end

gogainda avatar Sep 06 '21 13:09 gogainda

Thanks, I'll take a look. Self-note: the while i < times in the benchmark might actually make things less stable, I'll try without it to check.

Might also want to run without the C ext cases, or by .dup the string for each impl in case those affect the performance of other methods. That is likely actually, the C ext will RSTRING_PTR which converts the String to native, and if later we need it managed (to match a Regexp), we'll need to convert back and include that extra logic in the JITed code.

eregon avatar Sep 06 '21 14:09 eregon

I'm looking at String#blank? from ActiveSupport independently of this issue as part of an in-depth look at optimizing regexps in TruffleRuby. I had planned on looking at fast_blank, too. I'm happy to collaborate with anyone on it.

nirvdrum avatar Sep 07 '21 16:09 nirvdrum

I've begun looking at this a bit. As Benoit noted, the new_slow_blank? method does not match what ActiveSupport does. I dug into the history and it looks like the definition used to match, but changed five years ago. I'll open up a PR to fix that in fast_blank. While that doesn't have any bearing on TruffleRuby's fast_blank extension performance, it does help ensure we're comparing the right things.

With that change, the difference between slow_blank? and new_slow_blank? basically disappears, except for the empty input case. By way of explanation, the TRegex engine is able to optimize regex patterns that use anchors considerably better than those just looking for a pattern at an arbitrary position in the match string.

Oddly, my latest run doesn't align with the one from @gogainda in https://github.com/oracle/truffleruby/issues/2439#issuecomment-913655560. In particular, I'm seeing the "Fast ActiveSupport" method win out with the 6 character long string. I'm going to spend more time investigating why the pure Ruby implementations are slower on smaller strings. As Benoit noted, they shouldn't be. From there, I'll look into the C extension methods.

================== Test String Length: 0 ==================
Warming up --------------------------------------
          Fast Blank   120.423k i/100ms
  Fast ActiveSupport   260.257k i/100ms
          Slow Blank   199.128k i/100ms
      New Slow Blank    31.024M i/100ms
Calculating -------------------------------------
          Fast Blank      9.307M (± 7.2%) i/s -     46.242M in   5.002693s
  Fast ActiveSupport      8.180M (± 7.1%) i/s -     40.860M in   5.029876s
          Slow Blank      2.454M (±58.2%) i/s -      8.164M in   5.053282s
      New Slow Blank    307.158M (± 3.5%) i/s -      1.551B in   5.056869s

Comparison:
      New Slow Blank: 307157704.4 i/s
          Fast Blank:  9306826.4 i/s - 33.00x  (± 0.00) slower
  Fast ActiveSupport:  8179769.4 i/s - 37.55x  (± 0.00) slower
          Slow Blank:  2454167.7 i/s - 125.16x  (± 0.00) slower


================== Test String Length: 6 ==================
Warming up --------------------------------------
          Fast Blank   275.636k i/100ms
  Fast ActiveSupport    67.318k i/100ms
          Slow Blank   192.713k i/100ms
      New Slow Blank   100.374k i/100ms
Calculating -------------------------------------
          Fast Blank      3.226M (±13.7%) i/s -     15.987M in   5.074911s
  Fast ActiveSupport      3.516M (± 6.1%) i/s -     17.503M in   5.002017s
          Slow Blank      1.743M (±75.4%) i/s -      6.360M in   5.124090s
      New Slow Blank      2.040M (±57.7%) i/s -      6.324M in   5.035279s

Comparison:
  Fast ActiveSupport:  3515607.1 i/s
          Fast Blank:  3226493.1 i/s - same-ish: difference falls within error
      New Slow Blank:  2039591.2 i/s - 1.72x  (± 0.00) slower
          Slow Blank:  1743440.2 i/s - 2.02x  (± 0.00) slower


================== Test String Length: 14 ==================
Warming up --------------------------------------
          Fast Blank   473.156k i/100ms
  Fast ActiveSupport   165.767k i/100ms
          Slow Blank     5.907M i/100ms
      New Slow Blank     5.047M i/100ms
Calculating -------------------------------------
          Fast Blank      5.719M (± 3.2%) i/s -     28.863M in   5.052826s
  Fast ActiveSupport      5.596M (± 2.1%) i/s -     28.015M in   5.008551s
          Slow Blank     57.355M (± 6.5%) i/s -    289.433M in   5.071272s
      New Slow Blank     51.431M (± 7.6%) i/s -    257.373M in   5.039350s

Comparison:
          Slow Blank: 57354724.3 i/s
      New Slow Blank: 51430894.9 i/s - same-ish: difference falls within error
          Fast Blank:  5718579.8 i/s - 10.03x  (± 0.00) slower
  Fast ActiveSupport:  5596006.4 i/s - 10.25x  (± 0.00) slower


================== Test String Length: 24 ==================
Warming up --------------------------------------
          Fast Blank   320.204k i/100ms
  Fast ActiveSupport   314.719k i/100ms
          Slow Blank     3.323M i/100ms
      New Slow Blank     4.490M i/100ms
Calculating -------------------------------------
          Fast Blank      3.518M (± 3.7%) i/s -     17.611M in   5.015159s
  Fast ActiveSupport      3.340M (± 0.7%) i/s -     16.995M in   5.088683s
          Slow Blank     53.065M (± 1.7%) i/s -    265.839M in   5.011134s
      New Slow Blank     45.636M (± 9.4%) i/s -    229.013M in   5.074606s

Comparison:
          Slow Blank: 53064631.2 i/s
      New Slow Blank: 45636432.1 i/s - 1.16x  (± 0.00) slower
          Fast Blank:  3517504.2 i/s - 15.09x  (± 0.00) slower
  Fast ActiveSupport:  3339912.3 i/s - 15.89x  (± 0.00) slower


================== Test String Length: 136 ==================
Warming up --------------------------------------
          Fast Blank   176.592k i/100ms
  Fast ActiveSupport   159.082k i/100ms
          Slow Blank     3.347M i/100ms
      New Slow Blank     2.845M i/100ms
Calculating -------------------------------------
          Fast Blank      1.887M (± 4.9%) i/s -      9.536M in   5.067579s
  Fast ActiveSupport      1.879M (± 6.7%) i/s -      9.386M in   5.021470s
          Slow Blank     36.291M (± 8.4%) i/s -    180.758M in   5.020799s
      New Slow Blank     33.766M (± 7.9%) i/s -    167.879M in   5.010004s

Comparison:
          Slow Blank: 36291313.4 i/s
      New Slow Blank: 33766400.9 i/s - same-ish: difference falls within error
          Fast Blank:  1887041.7 i/s - 19.23x  (± 0.00) slower
  Fast ActiveSupport:  1879048.6 i/s - 19.31x  (± 0.00) slower

nirvdrum avatar Sep 09 '21 02:09 nirvdrum

As a follow-up, I did extract the code out into simpler benchmarks without the embedded while loops and used the benchmark-ips hold! method to ensure different benchmarks weren't adversely affecting each other. The slow_blank? and new_slow_blank? methods see a fairly large improvement:

String length: 136
Warming up --------------------------------------
         slow_blank?    10.381M i/100ms
Calculating -------------------------------------
         slow_blank?    103.737M (± 2.5%) i/s -      1.038B in  10.013468s

Pausing here -- run Ruby again to measure the next benchmark...
String length: 136
Warming up --------------------------------------
     new_slow_blank?     9.502M i/100ms
Calculating -------------------------------------
     new_slow_blank?     94.746M (± 2.4%) i/s -    950.232M in  10.035413s

Comparison:
         slow_blank?: 103736848.6 i/s
     new_slow_blank?: 94745893.2 i/s - 1.09x  (± 0.00) slower

Using frozen strings, the IPS value jumps even higher:

String length: 136
Warming up --------------------------------------
         slow_blank?    12.221M i/100ms
Calculating -------------------------------------
         slow_blank?    124.859M (± 3.0%) i/s -      1.259B in  10.091475s

Pausing here -- run Ruby again to measure the next benchmark...
String length: 136
Warming up --------------------------------------
     new_slow_blank?    13.120M i/100ms
Calculating -------------------------------------
     new_slow_blank?    131.366M (± 3.8%) i/s -      1.325B in  10.104700s

Comparison:
     new_slow_blank?: 131365974.9 i/s
         slow_blank?: 124858962.0 i/s - same-ish: difference falls within error

I'm reticent to go changing the way benchmarks are run while trying to improve that benchmark performance. Especially where this block form of report is supposed to reduce overhead. But, it looks like that form of the call is adding overhead that's impacting the benchmark execution. I'll look at that in more detail later.

nirvdrum avatar Sep 09 '21 03:09 nirvdrum

IMHO benchmark-ips is most reliable in the regular block form (no while inside), so if it doesn't make much difference for CRuby I think it's best to use that.

I think it's fair and good to measure both blanks and non-blanks strings in the same process, but ideally there should be a way to run with a single blank implementation in the process, since otherwise they can have side effects like mutating the string. Just s = s.dup before each report block might be a way to avoid the interference.

# frozen_string_literal: true has the effect to make the strings ImmutableRubyString and those always remain managed (good for Regexp-based impls), at the cost of a map lookup when needing the native char* for it. That's not ideal for measuring the C ext performance. We could add an inline cache in StringToNativeNode in single-context mode for ImmutableRubyString though.

I think s = s.dup is the easiest fix to still run everything in the same process and run a fair comparison.

eregon avatar Sep 09 '21 10:09 eregon

IMHO benchmark-ips is most reliable in the regular block form (no while inside), so if it doesn't make much difference for CRuby I think it's best to use that.

If we can make a case for it, then maybe it's worth getting it changed in the benchmark-ips README. The docs have an example showing the inner while loop with the comment:

# To reduce overhead, the number of iterations is passed in
# and the block must run the code the specific number of times.
# Used for when the workload is very small and any overhead
# introduces incorrectable errors

I think it's fair and good to measure both blanks and non-blanks strings in the same process, but ideally there should be a way to run with a single blank implementation in the process, since otherwise they can have side effects like mutating the string. Just s = s.dup before each report block might be a way to avoid the interference.

Agreed. I'll give that a try.

# frozen_string_literal: true has the effect to make the strings ImmutableRubyString and those always remain managed (good for Regexp-based impls), at the cost of a map lookup when needing the native char* for it. That's not ideal for measuring the C ext performance. We could add an inline cache in StringToNativeNode in single-context mode for ImmutableRubyString though.

I think s = s.dup is the easiest fix to still run everything in the same process and run a fair comparison.

Thanks for the explanation. I forgot about the extra look-up.

nirvdrum avatar Sep 09 '21 11:09 nirvdrum

I'm just circling back to this. We made some big improvements to regex processing that should ship in TruffleRuby 22.0.0. Running benchmarks on a native Ryzen 3700X system with Ubuntu 21.04, TruffleRuby's execution of ActiveSupport's String#blank? was ~5x the speed of _fast_blank's native extension with MRI 3.0.3. We should still make this extension run faster, but as Sam suggested, it might just make sense to have _fast_blank use Ruby rather than C on TruffleRuby.

In any event, with some of the regex stuff that we knew was problematic out of the way, it'll be easier to look at just the native extension execution in TruffleRuby now.

nirvdrum avatar Dec 07 '21 21:12 nirvdrum