rubocop-performance icon indicating copy to clipboard operation
rubocop-performance copied to clipboard

Detect regexes where a string could be used in .split

Open tas50 opened this issue 4 years ago • 6 comments

Is your feature request related to a problem? Please describe.

I've noticed that sometimes regexes are used when a simple string would work fine. The code below could / should be simplified to be just a string

so.split(/:/)

Describe the solution you'd like

Autocorrect to

so.split(':')

This is more clear to anyone reading the code and it's 3x faster:

Warming up --------------------------------------
         Regex Split    94.649k i/100ms
        String Split   290.261k i/100ms
Calculating -------------------------------------
         Regex Split    939.299k (± 2.0%) i/s -      4.732M in   5.040237s
        String Split      2.877M (± 1.7%) i/s -     14.513M in   5.046006s

Comparison:
        String Split:  2877000.8 i/s
         Regex Split:   939298.7 i/s - 3.06x  (± 0.00) slower

Describe alternatives you've considered

Not this time

tas50 avatar Oct 30 '20 19:10 tas50

I think we have handled some code similar to this in the Performance cops. Specifically, I think Performance/StringReplacement does some basic checks to see if we can swap regex for a string. I vaguely remember looking into something like this a while back. I think one of the concerns was that the performance of regex vs string changed based on the Ruby version.

rrosenblum avatar Nov 03 '20 19:11 rrosenblum

@rrosenblum Thanks for the suggestion. I will transfer this issue to rubocop-performance because https://github.com/rubocop-hq/rubocop-performance/pull/190 has been opened.

koic avatar Nov 08 '20 15:11 koic

I'm not seeing it:

 user               system    total              real
String#split(//)  0.765812   0.000649   0.766461 (  0.766922)
String#split('')  0.809555   0.000397   0.809952 (  0.810363)
require "benchmark"

MAX = 2000000

Benchmark.bm(30) do |x|
  x.report("String#split(//)"){
    string = "hello"
    MAX.times{ string.split(//) }
  }

  x.report("String#split('')"){
    string = "hello"
    MAX.times{ string.split('') }
  }
end

djberg96 avatar Jul 21 '21 04:07 djberg96

@djberg96 what ruby release are you running. Here are my results on 2.7.4 from your benchmark snippet:

                                     user     system      total        real
String#split(//)                11.726928   0.100159  11.827087 ( 11.910536)
String#split('')                 2.162106   0.020655   2.182761 (  2.202049)

tas50 avatar Jul 21 '21 04:07 tas50

@tas50 That was 3.0.1.

I just tried 2.7.2 and 2.6.6 as well. The 2.7.x results were similar to yours, whereas the 2.6.x results showed both much slower, but the regex was slightly faster.

djberg96 avatar Jul 21 '21 04:07 djberg96

We've known for a while that the performance measurements for some of the cops can change across Ruby versions. I don't think we document the performance benefits based on Ruby version. Maybe one solution to this that we start documenting performance across Ruby versions, and we start to use that information for targeting whether a cop is enabled or not.

benchmark-ips is the recommended tool for generating performance reports for snippets of code

Many of the cops are based on snippets from fast-ruby. Unfortunately, this project doesn't appear to be well maintained anymore. At one point in time, they did have a build that would show off performance for different versions of Ruby

rrosenblum avatar Jul 21 '21 18:07 rrosenblum