cpython gh-117431: Improve performance of startswith and endswith (version 2)

gh-117431: Improve performance of startswith and endswith (version 2)

Open eendebakpt opened this issue 1 year ago • 0 comments

Improve performance of startswith and endswith by eliminating double work in tailmatch.

Benchmark results:

single-character match: x.startswith('a'): Mean +- std dev: [main_startswith] 80.6 ns +- 0.7 ns -> [v2_startswith] 69.8 ns +- 0.8 ns: 1.15x faster
single-character fail: x.startswith('q'): Mean +- std dev: [main_startswith] 67.8 ns +- 0.6 ns -> [v2_startswith] 68.4 ns +- 0.7 ns: 1.01x slower
two-character match: x.startswith('ab'): Mean +- std dev: [main_startswith] 81.3 ns +- 2.1 ns -> [v2_startswith] 75.9 ns +- 0.6 ns: 1.07x faster
two-character fail head: x.startswith('qb'): Mean +- std dev: [main_startswith] 68.0 ns +- 1.1 ns -> [v2_startswith] 76.0 ns +- 0.9 ns: 1.12x slower
two-character fail tail: x.startswith('aq'): Mean +- std dev: [main_startswith] 73.0 ns +- 3.3 ns -> [v2_startswith] 68.5 ns +- 0.9 ns: 1.07x faster
multi-character match: x.startswith('abcdefghijkl'): Mean +- std dev: [main_startswith] 84.9 ns +- 1.1 ns -> [v2_startswith] 78.7 ns +- 2.3 ns: 1.08x faster
multi-character fail midle: x.startswith('abcdef_hijkl'): Mean +- std dev: [main_startswith] 85.3 ns +- 2.0 ns -> [v2_startswith] 78.6 ns +- 0.8 ns: 1.09x faster

Benchmark hidden because not significant (3): empty: x.startswith(''), multi-character different kind match: xu.startswith('abcdefghijkl'), multi-character different kind fail: xu.startswith('abcdef_hijkl')

Geometric mean: 1.03x faster

Benchmark script

import pyperf
runner = pyperf.Runner()

setup="""
x = 'abcdefghijklmnop'
y = 'abcdefghijklmnop_bbbbbbbbbbbbbbbbbb'

xu = x + '\u1234'
yu = y + '\u1234'
l = 'a' * 1000 + 'b'
x_startswith = x.startswith
"""

# Tested with ./python sw.py --rigorous -o main_startswith.json

if 1:
    runner.timeit(name="empty: x.startswith('')", stmt="x.startswith(''); y.startswith('')", setup=setup)
    runner.timeit(name="single-character match: x.startswith('a')", stmt="x.startswith('a'); y.startswith('a')", setup=setup)
    runner.timeit(name="single-character fail: x.startswith('q')", stmt="x.startswith('q'); y.startswith('q')", setup=setup)
    
    runner.timeit(name="two-character match: x.startswith('ab')", stmt="x.startswith('ab'); y.startswith('ab')", setup=setup)
    runner.timeit(name="two-character fail head: x.startswith('qb')", stmt="x.startswith('qb'); y.startswith('qb')", setup=setup)
    runner.timeit(name="two-character fail tail: x.startswith('aq')", stmt="x.startswith('aq'); y.startswith('aq')", setup=setup)
    runner.timeit(name="multi-character match: x.startswith('abcdefghijkl')", stmt="x.startswith('abcdefghijkl'); y.startswith('abcdefghijkl')", setup=setup)
    runner.timeit(name="multi-character fail midle: x.startswith('abcdef_hijkl')", stmt="x.startswith('abcdef_hijkl'); y.startswith('abcdef_hijkl')", setup=setup)

    runner.timeit(name="multi-character different kind match: xu.startswith('abcdefghijkl')", stmt="xu.startswith('abcdefghijkl'); yu.startswith('abcdefghijkl')", setup=setup)
    runner.timeit(name="multi-character different kind fail: xu.startswith('abcdef_hijkl')", stmt="xu.startswith('abcdefghijkl'); yu.startswith('abcdefghijkl')", setup=setup)

By first checking the tail of the substring and then the start, we can combine a call to PyUnicode_READ and memcmp
For the single character case we prevent the check with PyUnicode_READ from happening twice.
With this PR the performance of many cases improves, in particular the case where substrings match . The only case where performance is less, is for substrings that fail to match on the start of the substring (but the performance for substrings that fail on the end of the substring improves).
Also see #117480

Issue: gh-117431

Apr 11 '24 20:04 eendebakpt

cpython cpython copied to clipboard

gh-117431: Improve performance of startswith and endswith (version 2)

cpython
cpython copied to clipboard