ropey icon indicating copy to clipboard operation
ropey copied to clipboard

accelerate line iterator

Open pascalkuthe opened this issue 4 months ago • 2 comments

ropeys line iterator doesn't do so well compared to other ropes and I always wondered why. My recent work on a streaming regex engine made clear to me why: The string search routines we use for finding lines are not well optimized.

There are a couple of reasons I found (there may ben more):

  • The reveres search did not use any simd at all
  • The forward search does use simd (line_to_byte_idx) but these functions from str_indices are more optimized for counting/larger counts and don't do quite as well in the latency department (when line_idx=1)
  • ropey support unicode line breaks (just turning those off makes a huge difference)

I optimized the str utilities used by the line iterator to adress the first two and somewhat the last point (nothing else uses these so nothing else is affected). memchr can be used to accelerate both forward and reverse searches. This was sadly not possible for the unicode lines case because memchr can only search for a set of up to three bytes. The aoh-corasic crate contains vectorized implementations that would allow accelerating these cases, but I didn't want to pull that in.

Some data, run a benchmark with:

  • cargo bench _lines --no-default-features --features "simd" on the current master branch
  • cargo bench _lines --no-default-features --features "simd,memchr" on this branch

Results:

iter_prev_lines/lines   time:   [34.199 ns 34.259 ns 34.320 ns] (was 51.73ns)
                        change: [-33.702% -33.533% -33.322%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
iter_prev_lines/lines_tiny
                        time:   [21.990 ns 22.012 ns 22.036 ns] (was 37.55ns)
                        change: [-41.540% -41.456% -41.369%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

iter_next_lines/lines   time:   [34.450 ns 34.552 ns 34.645 ns] (was 39.25ns)
                        change: [-13.055% -12.513% -12.145%] (p = 0.00 < 0.05)
                        Performance has improved.
iter_next_lines/lines_tiny
                        time:   [29.280 ns 29.383 ns 29.538 ns] (was 35.79ns)
                        change: [-17.247% -16.909% -16.571%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)

memchr is behind a feature flag because it feels unfortunate to have a dependency not used by the default feature flag. Sadly you can't make a feature flag turn off a dependency so this hack will have to do. I think unicode line endings are one of things that are pedantically correct, but I suspect the majority of users will turn off and benefit from these optimizations.

pascalkuthe avatar Apr 08 '24 19:04 pascalkuthe