any/all short circuiting
Description
List comprehension can postpone the evaluation of any/all, which can hurt performance for larger iterables. Surprisingly, this was not yet included here as rule.
Example:
def hi():
print('hi')
return True
>>> any(hi() for num in [1, 2, 3, 4])
hi
>>> any([hi() for num in [1, 2, 3, 4]])
hi
hi
hi
hi
From this answer
Proposal
Extend the current set of rules with C418 and C419 (any, all), similar to C403 and C404
Thanks for the suggestion.
In the past I removed some rules that suggested using generators over comprehensions, because the increase in laziness is not always a 100% compatible change: https://github.com/adamchainz/flake8-comprehensions/blob/main/HISTORY.rst#340-2021-03-18 . In the case of any/all, any side effects from the comprehension would be lost. I prefer to have rules don't have any false positives.
Also generator expressions are not always faster. For small-ish collections, list comprehensions often win.
I'd be open to having a set of default-disabled rules that one opts into though. Perhaps there could be a setting ("suggest-generators" or something) that users add with the understanding that semantics could change...
Thoughts?
Thanks for the useful pointers.
The scope of C407 used to be much broader, considering all builtins - frequently resulting in false positives. For any/all, there are clear cases where the generator expression is preferable.
The choice for generator or comprehension should indeed be made consciously (and flake8-comprehensions should help with pointing that out). Opting-in and having a proper disclaimer seems like a great solution.
Notes on performance, for future reference:
A fully exhausted generator is slower than an equivalent comprehension. However,
anyandallwill terminate early when the firstTrueorFalsevalue respectively is encountered. As a rough estimate, if a fully exhausted generator is 50% slower (reasonable based on benchmarks), compared to an equivalent comprehension, then on average, a generator may be preferable when an early termination value is expected before 2/3rds of the sequence is processed. Note that the generator variant is more memory efficient.
@adamchainz Here is some performance bench-marking specifically for any / all: https://www.katrin-affolter.ch/Programming/performance_of_all_and_any
FYI for any who wants to use this rule, it's implemented in flake8-pie as PIE802.
Thanks @Skylion007 , since i'ts implemented elsewhere I think we can close this. I think I'll avoid adding rules that change laziness to this plugin, it'll just be easier to maintain and understand.
Actually there's a PR open, I think we cxan review and add this.
There is one counter example here, but the potential speedups are potentially worth it: https://github.com/charliermarsh/ruff/issues/3259#issuecomment-1454335988
The counterexample isn't for any/all ?