flake8-bugbear False positive B023

When upgrading, some false positives were found for B023 (implimented in https://github.com/PyCQA/flake8-bugbear/pull/265).

Example

"""B023.py"""
def filter_values(values: list[list[int]], max_percentage: float):
    for value in values:
        filter_val = max(value) * max_percentage
        yield list(filter(lambda x: x < filter_val, value))

$ flake8 B023.py
bugbear_me.py:4:41: B023 Function definition does not bind loop variable 'filter_val' is valid, and doesn't fall into the

A silly example here, but one where the use of filter_val is valid.

cc @Zac-HD In the PR you mentioned some hard to detect false positives that you were okay with. Was this kind of pattern one of the ones you had in mind?

Thanks for this check btw, I think it's a good idea. I've fallen for this in the past!

Jul 02 '22 22:07 paw-lu

Yep, this is one of those false alarms! I'd use a list-comprehension instead, which is both faster and IMO more readable:

def filter_values(values: list[list[int]], max_percentage: float):
    for value in values:
        filter_val = max(value) * max_percentage
        yield [x for x in values if x < filter_val]

Unfortunately it's more difficult to avoid this false alarm than you might think: for example "lambdas are OK if they're the first argument in a call to filter()" would miss bugs where the filter iterable (and therefore lambda) isn't immediately evaluated.

Jul 02 '22 23:07 Zac-HD

I'd use a list-comprehension instead, which is both faster and IMO more readable:

Yeah I agree with you there, this was a silly example to illustrate the false postive. Real example this was triggered on is more convoluted (but less silly).

Unfortunately it's more difficult to avoid this false alarm than you might think

Oh yeah, I can see where this gets annoying to impliment. Wondering outloud though if we value reducing false positives or false negatives more? As in would the rule be more useful if it only flagged an error when it was very sure of the problem (the consequence being it lets some problems slide, and we get more false negatives).

For the record, okay if we mark this as a known false positive!

Jul 02 '22 23:07 paw-lu

I'm down to add known false positives into the README to help save people time or maybe even allow for someone smarter to come up with a way to handle these edge cases. Maybe even state that in the README.

Jul 03 '22 15:07 cooperlees

I... don't think the real example is really a false alarm?

(terminology note: I much prefer the "false alarm"/"missed alarm" to the "false positive/negative", because it makes it much easier to tell what you're talking about, especially when the alarm is for a negative outcome.)

    for token in iterator:
        if (token_type := token.type) in open_close_pairs:
            close_tag = open_close_pairs[token_type]
            if token_type == close_tag:
                yield iter([token])
            else:
                yield itertools.chain(
                    (token,),
                    itertools.takewhile(
                        lambda token_: token_.type != close_tag, iterator
                    ),
                )

Unless you consume each yielded iterator before getting the next, you might have the wrong close tag!

Safer to explicitly bind it in lambda token_, ct=close_tag: token_.type != ct 🙂

Jul 03 '22 21:07 Zac-HD

Not the same usecase as described here originally, but I found a false alarm with the following code:

$ cat app.py
from app import db


for name in ['a', 'b']:
    def myfunc(name):
        qry = db.query(name=name)
        db.run(qry)

    myfunc(name=name)
    qry = db.query(description__contains=name)
    db.run(qry)

And when I run flake8:

$ venv/bin/flake8 app.py
app.py:7:16: B023 Function definition does not bind loop variable 'qry'.

Basically - if I have a variable defined within the loop - but the same variable name is also within the function. The local variable shouldn't have any problem in terms of execution ?

But - the rule thinks it is an issue

Jul 04 '22 07:07 AbdealiLoKo

Just ran into one in Jinja https://github.com/pallets/jinja/pull/1681 where if a file in a list of files doesn't exist, we continue the loop, otherwise we create a function that uses the file and return it, ending the loop.

for name in names:
    if not exists(name):
        continue

    def up_to_date():
        if exists(name):
            return mtime(name)

        return False

    return name, up_to_date

raise TemplateNotFound

I refactored it to something along the lines of the following, which I'm fine with, but it did take a bit to figure out how to use break/else instead.

for name in names:
    if exists(name):
        break
else:
    raise TemplateNotFound

def up_to_date():
    if exists(name):
        return mtime(name)

    return False

return name, up_to_date

Jul 04 '22 14:07 davidism

Could also do def up_to_date(name=name): inside the loop, if you prefer.

In principle we could avoid emitting the warning if we were sure that the it only occurs in the last loop iteration; in practice I'm not volunteering to write something that complicated at the moment.

Jul 04 '22 20:07 Zac-HD

Another false positive:

def foo(list_of_lists):
    for l in list_of_lists:
        priorities = calculate_priorities()
        first = min(l, key=lambda x: priorities.index(x))
        do_something(first)

Unlike filter, min (or max and also sorted) is evaluated immediately so there's no possible bug here. I think these should be easier to allow.

Jul 05 '22 10:07 noamkush

Unlike filter, min (or max and also sorted) is evaluated immediately so there's no possible bug here. I think these should be easier to allow.

It's possible that min, max, or sorted has been redefined... but in practice I agree, this would be a nice way to reduce the false-alarm rate.

Jul 07 '22 05:07 Zac-HD

just my 2c, this should maybe be off-by-default as it's difficult-to-impossible to know if the inner lambda / function is used in a deferred manner

there's a bunch of examples similar to this when upgrading in sentry: https://github.com/getsentry/sentry/blob/d89af1dace44c3888e7042d1c4957b9b5d514948/tests/sentry/plugins/bases/test_notify.py#L47-L58

(yes parametrize would be better, we're a bit stuck on unittest things for now :sob:)

Aug 01 '22 14:08 asottile-sentry

this example appears to trigger an unrelated false positive as well:

def get_group_backfill_attributes(caches, group, events):
    return {
        k: v
        for k, v in reduce(
            lambda data, event: merge_mappings(
                [data, {name: f(caches, data, event) for name, f in backfill_fields.items()}]
            ),
            events,
            {
                name: getattr(group, name)
                for name in set(initial_fields.keys()) | set(backfill_fields.keys())
            },
        ).items()
        if k in backfill_fields
    }

(yes the code is incomprehensible FP nonsense, but the name variable is being confused with an unrelated name variable in a different comprehension)

Aug 01 '22 14:08 asottile-sentry

def main():
    numbers = list(range(5))
    for number in numbers:
        one_hundred_list = list(range(100))
        squared = next(filter(lambda x: x == number**2, one_hundred_list))
        print(squared)  # This will print 0, 1, 4, 9, 16


if __name__ == "__main__":
    main()

Using filter() with next can also create a false positive

Aug 02 '22 23:08 ecs-jnguyen

Another false positive here: https://github.com/pimutils/vdirsyncer/blob/c50eabc77e3c30b1d20cd4fa9926459e4a081dc1/tests/storage/servers/davical/init.py#L42

tests/storage/servers/davical/init.py:42:50: B023 Function definition does not bind loop variable 's'.

Aug 03 '22 09:08 WhyNotHugo

My false-positive example:

def foo(forbidden_calls, forbidden_functions, strings):
    for fn in forbidden_calls:
        f = forbidden_functions[fn]
        waiver = any(map(lambda string: f['waiver_regex'].search(string), strings))

    return waiver

flake8 /tmp/demo.py 
/tmp/demo.py:4:41: B023 Function definition does not bind loop variable 'f'.

Aug 29 '22 14:08 marxin

I expect it'll be quite a while before I get back to this again, so here: https://github.com/PyCQA/flake8-bugbear/compare/main...Zac-HD:flake8-bugbear:B023-false-alarms

This patch fixes the false alarm on functions where a local variable shadows the loop variable, and adds regression tests for the more common key=/filter/reduce -based false alarms but doesn't fix those yet. If anyone would like to finish this and make PR, do so with my blessing!

(and @cooperlees if you're making a release anyway, cherry-pick the partial fix?)

Oct 17 '22 06:10 Zac-HD

I'm on it!

Oct 25 '22 10:10 jakkdl

The PR should fix a bunch of the false alarms mentioned in here, but probably not all of them. Feel free to check it out

Oct 25 '22 11:10 jakkdl

My false-positive example:

def foo(forbidden_calls, forbidden_functions, strings):
    for fn in forbidden_calls:
        f = forbidden_functions[fn]
        waiver = any(map(lambda string: f['waiver_regex'].search(string), strings))

    return waiver

This issue is still present with the latest release and main branch.

Oct 26 '22 07:10 marxin

As written, you skip all but the last loop iteration, since waiver is reassigned every time! Did you mean waiver = waiver or any(...)? Other tricks to fix the warning include adding , f=f: to the lambda args, or avoiding a lambda entirely with:

for fn in forbidden_calls:
    searcher = forbidden_functions[fn]['waiver_regex'].search
    waiver = any(map(searcher, strings))

But more generally, yeah, we should handle map the same way as filter and reduce - thanks for the re-report 😅 @jakkdl mind adding that and copy-and-editing some of the existing tests? My bad for omitting them in the first place...

Oct 26 '22 08:10 Zac-HD

As written, you skip all but the last loop iteration, since waiver is reassigned every time!

Whoops, you are right, my code snippet is really suspicious.

Oct 26 '22 08:10 marxin

Done~ :slightly_smiling_face:

Oct 26 '22 09:10 jakkdl

This sample still fails. Shall II open a separate issue for it?

for shipment in sender.shipments:
    transaction.on_commit(lambda: update_single_me_shipment.delay(shipment.id)

Oct 27 '22 16:10 WhyNotHugo

That's not a false alarm; unless the lambdas run immediately every single one will use the last shipment.

Oct 27 '22 17:10 Zac-HD

Further comments to new issues please, we've fixed all the false alarms we know of.

We've also received several reports where the linter is in fact correct, so consider refactoring your code to make it happy even if you think it's probably a false alarm!

Oct 27 '22 17:10 Zac-HD