cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-117431: Improve performance of startswith and endswith (version 2)

Open eendebakpt opened this issue 1 year ago • 0 comments

Improve performance of startswith and endswith by eliminating double work in tailmatch.

Benchmark results:

single-character match: x.startswith('a'): Mean +- std dev: [main_startswith] 80.6 ns +- 0.7 ns -> [v2_startswith] 69.8 ns +- 0.8 ns: 1.15x faster
single-character fail: x.startswith('q'): Mean +- std dev: [main_startswith] 67.8 ns +- 0.6 ns -> [v2_startswith] 68.4 ns +- 0.7 ns: 1.01x slower
two-character match: x.startswith('ab'): Mean +- std dev: [main_startswith] 81.3 ns +- 2.1 ns -> [v2_startswith] 75.9 ns +- 0.6 ns: 1.07x faster
two-character fail head: x.startswith('qb'): Mean +- std dev: [main_startswith] 68.0 ns +- 1.1 ns -> [v2_startswith] 76.0 ns +- 0.9 ns: 1.12x slower
two-character fail tail: x.startswith('aq'): Mean +- std dev: [main_startswith] 73.0 ns +- 3.3 ns -> [v2_startswith] 68.5 ns +- 0.9 ns: 1.07x faster
multi-character match: x.startswith('abcdefghijkl'): Mean +- std dev: [main_startswith] 84.9 ns +- 1.1 ns -> [v2_startswith] 78.7 ns +- 2.3 ns: 1.08x faster
multi-character fail midle: x.startswith('abcdef_hijkl'): Mean +- std dev: [main_startswith] 85.3 ns +- 2.0 ns -> [v2_startswith] 78.6 ns +- 0.8 ns: 1.09x faster

Benchmark hidden because not significant (3): empty: x.startswith(''), multi-character different kind match: xu.startswith('abcdefghijkl'), multi-character different kind fail: xu.startswith('abcdef_hijkl')

Geometric mean: 1.03x faster
Benchmark script
import pyperf
runner = pyperf.Runner()

setup="""
x = 'abcdefghijklmnop'
y = 'abcdefghijklmnop_bbbbbbbbbbbbbbbbbb'

xu = x + '\u1234'
yu = y + '\u1234'
l = 'a' * 1000 + 'b'
x_startswith = x.startswith
"""

# Tested with ./python sw.py --rigorous -o main_startswith.json

if 1:
    runner.timeit(name="empty: x.startswith('')", stmt="x.startswith(''); y.startswith('')", setup=setup)
    runner.timeit(name="single-character match: x.startswith('a')", stmt="x.startswith('a'); y.startswith('a')", setup=setup)
    runner.timeit(name="single-character fail: x.startswith('q')", stmt="x.startswith('q'); y.startswith('q')", setup=setup)
    
    runner.timeit(name="two-character match: x.startswith('ab')", stmt="x.startswith('ab'); y.startswith('ab')", setup=setup)
    runner.timeit(name="two-character fail head: x.startswith('qb')", stmt="x.startswith('qb'); y.startswith('qb')", setup=setup)
    runner.timeit(name="two-character fail tail: x.startswith('aq')", stmt="x.startswith('aq'); y.startswith('aq')", setup=setup)
    runner.timeit(name="multi-character match: x.startswith('abcdefghijkl')", stmt="x.startswith('abcdefghijkl'); y.startswith('abcdefghijkl')", setup=setup)
    runner.timeit(name="multi-character fail midle: x.startswith('abcdef_hijkl')", stmt="x.startswith('abcdef_hijkl'); y.startswith('abcdef_hijkl')", setup=setup)

    runner.timeit(name="multi-character different kind match: xu.startswith('abcdefghijkl')", stmt="xu.startswith('abcdefghijkl'); yu.startswith('abcdefghijkl')", setup=setup)
    runner.timeit(name="multi-character different kind fail: xu.startswith('abcdef_hijkl')", stmt="xu.startswith('abcdefghijkl'); yu.startswith('abcdefghijkl')", setup=setup)
  • By first checking the tail of the substring and then the start, we can combine a call to PyUnicode_READ and memcmp
  • For the single character case we prevent the check with PyUnicode_READ from happening twice.
  • With this PR the performance of many cases improves, in particular the case where substrings match . The only case where performance is less, is for substrings that fail to match on the start of the substring (but the performance for substrings that fail on the end of the substring improves).
  • Also see #117480
  • Issue: gh-117431

eendebakpt avatar Apr 11 '24 20:04 eendebakpt