ruff icon indicating copy to clipboard operation
ruff copied to clipboard

Rule Suggestion `min`/`max` instead of `sorted(l)[0]`

Open Skylion007 opened this issue 11 months ago • 9 comments

Saw this in code:

a = sorted(l)[0]
a = sorted(l, reverse=True)[-1] # not sure about this one....

should be

a = min(l)

Additionally, the following code pattern can also be simplified

a = sorted(l, reverse=True)[0] 
a = sorted(l)[-1] # not sure about this, sort stability could make it output a different value, right?

should be

a = max(l)

Is there already a pylint rule for this? I wouldn't be surprised. If not, let's add it as a RUFF rule.

I am not sure about the [-1] variants, as there may be some subtlety there with sort stability?

Skylion007 avatar Mar 18 '24 21:03 Skylion007

This seems nice for my first rule contribution. It occurs way more than I was expecting from this GitHub search, though some of these are not actual cases of this violation.

min() and max() support a key argument as well, so the fix would forward that to these functions.

Is there already a pylint rule for this? I wouldn't be surprised. If not, let's add it as a RUFF rule.

Not from a quick search in their docs.

@charliermarsh do you think this is something I could tackle? Any guidance would be nice but not necessary until I hit roadblocks.

Performance impact of fixing this violation

Test script

import timeit
import random

def method_sorted(a):
    return sorted(a)[0]

def method_min(a):
    return min(a)

sizes = [10**i for i in range(1, 5)]  # Sizes: 10, 100, 1000, ...
number_of_executions = 1000

for size in sizes:
    a = [random.randint(1, 1000) for _ in range(size)]
    
    time_sorted = timeit.timeit(lambda: method_sorted(a), number=number_of_executions)
    time_min = timeit.timeit(lambda: method_min(a), number=number_of_executions)
    
    speedup = time_sorted / time_min
    print(f"Size: {size}, sorted(a)[0] time: {time_sorted:.5f}s, min(a) time: {time_min:.5f}s, Speedup: {speedup:.2f}x")

On my M1 pro:

Size: 10, sorted(a)[0] time: 0.00024s, min(a) time: 0.00021s, Speedup: 1.17x
Size: 100, sorted(a)[0] time: 0.00279s, min(a) time: 0.00118s, Speedup: 2.36x
Size: 1000, sorted(a)[0] time: 0.05002s, min(a) time: 0.01093s, Speedup: 4.58x
Size: 10000, sorted(a)[0] time: 0.88557s, min(a) time: 0.10932s, Speedup: 8.10x

Results are less impressive, but still good, with key=lambda x: x:

Size: 10, sorted(a)[0] time: 0.00067s, min(a) time: 0.00065s, Speedup: 1.03x
Size: 100, sorted(a)[0] time: 0.00629s, min(a) time: 0.00433s, Speedup: 1.45x
Size: 1000, sorted(a)[0] time: 0.08372s, min(a) time: 0.04047s, Speedup: 2.07x
Size: 10000, sorted(a)[0] time: 1.23920s, min(a) time: 0.40743s, Speedup: 3.04x

ottaviohartman avatar Mar 18 '24 23:03 ottaviohartman

Just implemented this as FURB192 in Refurb: https://github.com/dosisod/refurb/pull/333

dosisod avatar Mar 20 '24 06:03 dosisod

Python sort and sorted are guaranteed to be stable, so it could have problem like this:

from operator import itemgetter
data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]

min(data, key=itemgetter(0))  # ('blue', 1)
sorted(data, key=itemgetter(0))[0]  # ('blue', 1)
sorted(data, key=itemgetter(0), reverse=True)[-1]  # ('blue, 2')

In this case, the order of ('red', 1), ('red', 2) and ('blue', 1), ('blue', 2) is retained for both sorted call, which cause the result of the two to become different. So we need to be careful when adding the rule for [-1], though I think it is not very frequent to intentionally use sorted function that way.

boolean-light avatar Mar 20 '24 07:03 boolean-light

I would be okay documenting that as a caveat of the rule, seems like a weird edge-case.

zanieb avatar Mar 20 '24 14:03 zanieb

We should probably flag those unstable sorts as unsafe fixes if we go through with them.

Skylion007 avatar Mar 22 '24 15:03 Skylion007

How can we identify them as unstable?

zanieb avatar Mar 22 '24 16:03 zanieb

We should test it out using the example provided above but I think only the '[-1]' index pattern is unstable.

Skylion007 avatar Mar 30 '24 15:03 Skylion007

Actually to clarify, all these sorts are stable, but reverse=True reverse the order unstable elements apparently. (And reverse=True, '[-1]'). So only the first proposed fix using min (and it's reversed version) will always be guaranteed to be the same. I suppose. Maybe that warrants its own rule in that case or a rule toggle not flag it when it could change the returned value on ties.

Skylion007 avatar Mar 30 '24 15:03 Skylion007

I'd like to start working on this if nobody has started yet :)

ottaviohartman avatar Apr 02 '24 00:04 ottaviohartman