Performance Improvement

Using enumerate(...) in for loops are slow and there are at least +250 enumerate's in CPython

Let's take for example this for loop with enumerate:

https://github.com/python/cpython/blob/5f660e8e2ca3acfb89ccbdd990f072149b6baa6a/Lib/genericpath.py#L116-L118

If you remove enumerate, this method will run 5% to 7% faster:

    i = 0
    for c in s1:
        if c != s2[i]:
            return s1[:i]
        i += 1

Complete performance test script, basically I copied twice the original commonprefix(m) and then I modified the last one:

import os
import time

# From file CPython/Lib/genericpath.py

# Return the longest prefix of all list elements.
def old_commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    # Some people pass in a list of pathname parts to operate in an OS-agnostic
    # fashion; don't try to translate in that case as that's an abuse of the
    # API and they are already doing what they need to be OS-agnostic and so
    # they most likely won't be using an os.PathLike object in the sublists.
    if not isinstance(m[0], (list, tuple)):
        m = tuple(map(os.fspath, m))
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1

# Return the longest prefix of all list elements.
def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    # Some people pass in a list of pathname parts to operate in an OS-agnostic
    # fashion; don't try to translate in that case as that's an abuse of the
    # API and they are already doing what they need to be OS-agnostic and so
    # they most likely won't be using an os.PathLike object in the sublists.
    if not isinstance(m[0], (list, tuple)):
        m = tuple(map(os.fspath, m))
    s1 = min(m)
    s2 = max(m)
    i = 0
    for c in s1:
        if c != s2[i]:
            return s1[:i]
        i += 1
    return s1

def perfomance(func):
    start = time.time()
    for _ in range(1000000):
        func(["/usr/lib", "/usr/local/lib", "/usr/local/bin", "/usr/bin", "/usr/local/share"])
    end = time.time()
    return end - start

old_perfomance = perfomance(old_commonprefix)
new_perfomance = perfomance(commonprefix)
perc = (old_perfomance / new_perfomance) * 100

print(f"with enumerate: {old_perfomance} secs")
print(f"without enumerate: {new_perfomance} secs")
print(f"New version is {perc:.2f}% faster")

You will get this performance improvement:

with enumerate: 0.8863015174865723 secs
without enumerate: 0.8414063453674316 secs
New version is 105.34% faster

Running this commands, you can count and find all for loops with enumerate in /Lib for example

$ grep -r "in enumerate" ./Lib | wc -l
$ grep -rl "in enumerate" ./Lib

If this is useful for CPython, I can create specific issues+PR replacing enumerate(...)'s.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Jul 04 '24 20:07 LimaBD

This sounds like an issue for https://github.com/faster-cpython/ideas. Someone clever might optimise enumerate().

Jul 04 '24 21:07 nineteendo

Yes enumerate will introduce some overhead. In your example, the one-time overhead to create the enumerate object is also significant. If you increase the common prefix, the diff would be ~2% to 3%. Yes it's "slower" than incrementing an index by yourself, but sometimes that's the cost of making the code look good. Basically under the hood, a new tuple is created and unpacked.

An important thing to notice is that your loop is basically an "empty loop", which makes the cost of the loop itself more significant. In more common cases, where the loop has more code to execute, the impact will be smaller, possibly negligible.

No, you should not replace all the enumerates in CPython. Like I mentioned above, it's not that slow, in many cases you won't be able to get the difference. More importantly, performance is not the only golden rule for CPython. Code readability is critical for maintenance and that's the reason we have enumerate.

Jul 04 '24 21:07 gaogaotiantian

The timings bounce around from version to version:

% python3.13 time_enumerate.py
1.7251772080198862 enum
1.717002582969144 += 1

% python3.12 time_enumerate.py
1.7478526670020074 enum
1.7903573340154253 += 1

% python3.11 time_enumerate.py
1.5525686250184663 enum
1.4752536249579862 += 1

% python3.10 time_enumerate.py
1.713427166978363 enum
1.955500791023951 += 1

Timing code:

def f(s):
    for i, x in enumerate(s):
        pass

def g(s):
    i = 0
    for x in s:
        i += 1

s = [None] * 10_000


if __name__ == '__main__':

    from timeit import repeat

    n = 10 ** 4
    print(min(repeat('f(s)', 'from __main__ import f, g, s', number=n)), 'enum')
    print(min(repeat('g(s)', 'from __main__ import f, g, s', number=n)), '+= 1')

As noted above, readability is the primary goal. Marking as closed.

Jul 05 '24 14:07 rhettinger

enumerate's are slow

Performance Improvement

Has this already been discussed elsewhere?

Links to previous discussion of this feature: