flake8-comprehensions
flake8-comprehensions copied to clipboard
Add rule to recommend str.join takes a list comp over generator
"".join()
accepts generators
>>> ",".join(str(i) for i in range(3))
'0,1,2'
Indeed it can. I think this will require a bit more code to detect than for the existing rules - we'd have to search for a string object, attribute access, and then function call, rather than just a function name. If you want to look into a PR please feel welcome :)
It would be nice if this checker could detect all functions that accept typing.Iterable
Since flake8 doesn't do any typing, we're restricted to builtins here.
Actually the linter should enforce ",".join([str(i) for i in range(3)])
this is faster because join internally converts all generators to lists
See here: https://stackoverflow.com/a/9061024/833093
Yes Tom, thanks!
Actually the linter should enforce
",".join([str(i) for i in range(3)])
this is faster because join internally converts all generators to lists
It seems it is not much faster these days (the original measurement in SO is 8 years old)
Still a tiny bit faster - Python 3.8.2:
In [1]: %timeit ",".join([str(i) for i in range(3)])
1.01 µs ± 37.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [2]: %timeit ",".join([str(i) for i in range(300)])
55.3 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [3]: %timeit ",".join((str(i) for i in range(3)))
1.15 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit ",".join((str(i) for i in range(300)))
61 µs ± 623 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
You compared list to tuple, but the issue is about generator (see the description)
Try %timeit ",".join(str(i) for i in range(*))
(str(i) for i in range(3))
is a genertaor. It just has extra brackets.
Python 3.11.2 still shows this perf difference:
In [1]: %timeit ",".join([str(i) for i in range(3)])
322 ns ± 1.13 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [2]: %timeit ",".join([str(i) for i in range(300)])
19.2 µs ± 35.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [3]: %timeit ",".join((str(i) for i in range(3)))
407 ns ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [4]: %timeit ",".join((str(i) for i in range(300)))
22.6 µs ± 60.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
I think we could add another rule.
Since flake8 doesn't do any typing, we're restricted to builtins here.
@hmvp @adamchainz Python >=3.8's stdlib ast module actually supports getting the type of functions (although it's not that well documented): https://stackoverflow.com/a/70143560/2444240 Since I assume 3.7 support will be dropped soon, we can look into adding this, (although I would be fine with restricting it to methods from stdlib).
Yaeh that doesn't help that much - for example, you can't get the type of imported symbols, or the implied types of variables.
In [2]: %timeit ",".join([str(i) for i in range(300)]) 19.2 µs ± 35.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [4]: %timeit ",".join((str(i) for i in range(300))) 22.6 µs ± 60.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
A 10% difference with such tiny numbers is a statistical insignificance. Hardly grounds to make a new rule and nag developers about it. If both approaches take somewhat the same time and memory, then flake8-comprehensions shouldn't be nagging people to prefer one syntax.
Or at least make it into an opinionated warning, in the C9xx range.
The gap has grown significantly in Python 3.12 thanks to Comprehension inlining:
In [1]: %timeit ",".join([str(i) for i in range(3)])
235 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [2]: %timeit ",".join([str(i) for i in range(300)])
16.8 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [3]: %timeit ",".join((str(i) for i in range(3)))
394 ns ± 5.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [4]: %timeit ",".join((str(i) for i in range(300)))
20.7 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Now about a 40% difference on the small case and 20% on the larger one.
If both approaches take somewhat the same time and memory, then flake8-comprehensions shouldn't be nagging people to prefer one syntax.
They don't take the same time and memory, that is known. The implementation of join
first converts the generator into a list.