flake8-comprehensions icon indicating copy to clipboard operation
flake8-comprehensions copied to clipboard

any/all short circuiting

Open sbrugman opened this issue 3 years ago • 2 comments

Description

List comprehension can postpone the evaluation of any/all, which can hurt performance for larger iterables. Surprisingly, this was not yet included here as rule.

Example:

def hi():
    print('hi')
    return True

>>> any(hi() for num in [1, 2, 3, 4])
hi

>>> any([hi() for num in [1, 2, 3, 4]])
hi
hi
hi
hi

From this answer

Proposal

Extend the current set of rules with C418 and C419 (any, all), similar to C403 and C404

sbrugman avatar May 16 '22 17:05 sbrugman

Thanks for the suggestion.

In the past I removed some rules that suggested using generators over comprehensions, because the increase in laziness is not always a 100% compatible change: https://github.com/adamchainz/flake8-comprehensions/blob/main/HISTORY.rst#340-2021-03-18 . In the case of any/all, any side effects from the comprehension would be lost. I prefer to have rules don't have any false positives.

Also generator expressions are not always faster. For small-ish collections, list comprehensions often win.

I'd be open to having a set of default-disabled rules that one opts into though. Perhaps there could be a setting ("suggest-generators" or something) that users add with the understanding that semantics could change...

Thoughts?

adamchainz avatar May 19 '22 08:05 adamchainz

Thanks for the useful pointers.

The scope of C407 used to be much broader, considering all builtins - frequently resulting in false positives. For any/all, there are clear cases where the generator expression is preferable.

The choice for generator or comprehension should indeed be made consciously (and flake8-comprehensions should help with pointing that out). Opting-in and having a proper disclaimer seems like a great solution.

Notes on performance, for future reference:

A fully exhausted generator is slower than an equivalent comprehension. However, any and all will terminate early when the first True or False value respectively is encountered. As a rough estimate, if a fully exhausted generator is 50% slower (reasonable based on benchmarks), compared to an equivalent comprehension, then on average, a generator may be preferable when an early termination value is expected before 2/3rds of the sequence is processed. Note that the generator variant is more memory efficient.

sbrugman avatar May 19 '22 09:05 sbrugman

@adamchainz Here is some performance bench-marking specifically for any / all: https://www.katrin-affolter.ch/Programming/performance_of_all_and_any

Skylion007 avatar Feb 18 '23 22:02 Skylion007

FYI for any who wants to use this rule, it's implemented in flake8-pie as PIE802.

Skylion007 avatar Feb 21 '23 18:02 Skylion007

Thanks @Skylion007 , since i'ts implemented elsewhere I think we can close this. I think I'll avoid adding rules that change laziness to this plugin, it'll just be easier to maintain and understand.

adamchainz avatar Mar 18 '23 11:03 adamchainz

Actually there's a PR open, I think we cxan review and add this.

adamchainz avatar Mar 19 '23 08:03 adamchainz

There is one counter example here, but the potential speedups are potentially worth it: https://github.com/charliermarsh/ruff/issues/3259#issuecomment-1454335988

Skylion007 avatar Mar 19 '23 17:03 Skylion007

The counterexample isn't for any/all ?

adamchainz avatar Mar 27 '23 14:03 adamchainz