flake8-comprehensions icon indicating copy to clipboard operation
flake8-comprehensions copied to clipboard

Add rule to recommend str.join takes a list comp over generator

Open hmvp opened this issue 5 years ago • 15 comments

"".join() accepts generators

>>> ",".join(str(i) for i in range(3))
'0,1,2'

hmvp avatar Jun 06 '19 09:06 hmvp

Indeed it can. I think this will require a bit more code to detect than for the existing rules - we'd have to search for a string object, attribute access, and then function call, rather than just a function name. If you want to look into a PR please feel welcome :)

adamchainz avatar Jun 07 '19 15:06 adamchainz

It would be nice if this checker could detect all functions that accept typing.Iterable

hmvp avatar Jun 13 '19 13:06 hmvp

Since flake8 doesn't do any typing, we're restricted to builtins here.

adamchainz avatar Jun 13 '19 13:06 adamchainz

Actually the linter should enforce ",".join([str(i) for i in range(3)]) this is faster because join internally converts all generators to lists

graingert avatar Aug 28 '19 12:08 graingert

See here: https://stackoverflow.com/a/9061024/833093

graingert avatar Aug 28 '19 12:08 graingert

Yes Tom, thanks!

adamchainz avatar Aug 28 '19 13:08 adamchainz

Actually the linter should enforce ",".join([str(i) for i in range(3)]) this is faster because join internally converts all generators to lists

It seems it is not much faster these days (the original measurement in SO is 8 years old)

dvorapa avatar Apr 05 '20 13:04 dvorapa

Still a tiny bit faster - Python 3.8.2:

In [1]: %timeit ",".join([str(i) for i in range(3)])
1.01 µs ± 37.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [2]: %timeit ",".join([str(i) for i in range(300)])
55.3 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit ",".join((str(i) for i in range(3)))
1.15 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit ",".join((str(i) for i in range(300)))
61 µs ± 623 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

adamchainz avatar Apr 05 '20 13:04 adamchainz

You compared list to tuple, but the issue is about generator (see the description)

Try %timeit ",".join(str(i) for i in range(*))

dvorapa avatar Apr 05 '20 14:04 dvorapa

(str(i) for i in range(3)) is a genertaor. It just has extra brackets.

adamchainz avatar Apr 05 '20 14:04 adamchainz

Python 3.11.2 still shows this perf difference:

In [1]: %timeit ",".join([str(i) for i in range(3)])
322 ns ± 1.13 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [2]:  %timeit ",".join([str(i) for i in range(300)])
19.2 µs ± 35.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit ",".join((str(i) for i in range(3)))
407 ns ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit ",".join((str(i) for i in range(300)))
22.6 µs ± 60.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I think we could add another rule.

adamchainz avatar Mar 19 '23 08:03 adamchainz

Since flake8 doesn't do any typing, we're restricted to builtins here.

@hmvp @adamchainz Python >=3.8's stdlib ast module actually supports getting the type of functions (although it's not that well documented): https://stackoverflow.com/a/70143560/2444240 Since I assume 3.7 support will be dropped soon, we can look into adding this, (although I would be fine with restricting it to methods from stdlib).

Skylion007 avatar Mar 19 '23 17:03 Skylion007

Yaeh that doesn't help that much - for example, you can't get the type of imported symbols, or the implied types of variables.

adamchainz avatar Mar 20 '23 15:03 adamchainz

In [2]: %timeit ",".join([str(i) for i in range(300)]) 19.2 µs ± 35.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [4]: %timeit ",".join((str(i) for i in range(300))) 22.6 µs ± 60.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

A 10% difference with such tiny numbers is a statistical insignificance. Hardly grounds to make a new rule and nag developers about it. If both approaches take somewhat the same time and memory, then flake8-comprehensions shouldn't be nagging people to prefer one syntax.

Or at least make it into an opinionated warning, in the C9xx range.

joaoe avatar Jun 30 '23 12:06 joaoe

The gap has grown significantly in Python 3.12 thanks to Comprehension inlining:

In [1]: %timeit ",".join([str(i) for i in range(3)])
235 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [2]: %timeit ",".join([str(i) for i in range(300)])
16.8 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit ",".join((str(i) for i in range(3)))
394 ns ± 5.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit ",".join((str(i) for i in range(300)))
20.7 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Now about a 40% difference on the small case and 20% on the larger one.

If both approaches take somewhat the same time and memory, then flake8-comprehensions shouldn't be nagging people to prefer one syntax.

They don't take the same time and memory, that is known. The implementation of join first converts the generator into a list.

adamchainz avatar Jul 03 '23 10:07 adamchainz