ruff icon indicating copy to clipboard operation
ruff copied to clipboard

[`wemake_python_styleguide`] Implement `ControlVarUsedAfterBlockViolation` (`WPS441`)

Open MadLittleMods opened this issue 1 year ago • 1 comments

Summary

Implement ControlVarUsedAfterBlockViolation (WPS441). This rule prevents you from using for-loop variables outside of the loop block (or a with context variable).

Example:

# For control var used outside block
for event in []:
    _ = event
    pass

# ❌
_ = event

# with statement
with None as w:
    _ = w
    pass

# ❌
_ = w

Part of https://github.com/astral-sh/ruff/issues/3845

This is spawning from a real-world pain where I accidentally used one for-loop control variable in another loop, see https://github.com/element-hq/synapse/pull/17187#discussion_r1612211662

Please bear in mind that this is my first Rust code so I'm probably doing a few things wrong :bow:. Shout out to @reivilibre who helped me in the beginning!

Test Plan

Tested against the fixture file:

cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py --no-cache --select WPS441

Also tested against the https://github.com/element-hq/synapse codebase. It does trigger in several spots that work because Python works this way but could be refactored to align with this rule. Some of the usages seem slightly sketchy but it requires some more context/time to determine whether they are bugs in the Synapse code.

Here is a real world bug that it caught: https://github.com/element-hq/synapse/pull/17272

Dev notes

Running ruff against your test file with your new rule:

cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py --no-cache --select WPS441

Getting started

Running the pre-commit hooks without installing to your system:

python -m venv /home/eric/Downloads/ruff-venv
source /home/eric/Downloads/ruff-venv/bin/activate
pip install pre-commit

Debug NodeId/node_id

Add the following method to crates/ruff_python_semantic/src/model.rs and use println!("nodes {:#?}", checker.semantic().nodes());. It would be really nice if the node index/ID was printed next to each one but you can manually number things to get by.

    #[inline]
    pub fn nodes(&self) -> &Nodes<'a> {
        &self.nodes
    }

Inspect the AST

cargo dev print-ast crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py

Helpful related ruff rules

  • unused_variable
  • redefined_loop_name
  • too_many_nested_blocks

MadLittleMods avatar Jun 06 '24 01:06 MadLittleMods

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+117 -0 violations, +0 -0 fixes in 15 projects; 39 projects unchanged)

DisnakeDev/disnake (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ disnake/message.py:1104:28: RUF032 `for` loop variable index is used outside of block
+ disnake/message.py:1105:16: RUF032 `for` loop variable reaction is used outside of block
+ scripts/codemods/typed_permissions.py:84:33: RUF032 `for` loop variable b is used outside of block
RasaHQ/rasa (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ tests/shared/nlu/training_data/test_features.py:234:46: RUF032 `for` loop variable idx is used outside of block
+ tests/shared/nlu/training_data/test_features.py:234:67: RUF032 `for` loop variable idx is used outside of block
apache/airflow (+25 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ airflow/api/__init__.py:46:76: RUF032 `for` loop variable backend is used outside of block
+ airflow/models/param.py:284:67: RUF032 `for` loop variable k is used outside of block
+ airflow/providers/amazon/aws/sensors/s3.py:150:64: RUF032 `for` loop variable key is used outside of block
+ airflow/providers/amazon/aws/sensors/s3.py:154:41: RUF032 `for` loop variable key is used outside of block
+ airflow/providers/amazon/aws/sensors/s3.py:159:50: RUF032 `for` loop variable key is used outside of block
+ airflow/providers/apache/hive/hooks/hive.py:1015:59: RUF032 `for` loop variable i is used outside of block
+ airflow/providers/cncf/kubernetes/utils/pod_manager.py:476:60: RUF032 `for` loop variable line is used outside of block
+ airflow/providers/cncf/kubernetes/utils/pod_manager.py:479:60: RUF032 `for` loop variable line is used outside of block
+ airflow/providers/docker/operators/docker_swarm.py:221:32: RUF032 `for` loop variable line is used outside of block
+ airflow/timetables/events.py:103:47: RUF032 `for` loop variable next_event is used outside of block
... 15 additional changes omitted for project
apache/superset (+8 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ superset/cli/test_db.py:235:8: RUF032 `for` loop variable spec is used outside of block
+ superset/cli/test_db.py:238:21: RUF032 `for` loop variable spec is used outside of block
+ superset/cli/test_db.py:269:12: RUF032 `for` loop variable spec is used outside of block
+ superset/migrations/versions/2017-10-03_14-37_4736ec66ce19_.py:204:19: RUF032 `for` loop variable foreign is used outside of block
+ superset/migrations/versions/2020-08-12_00-24_978245563a02_migrate_iframe_to_dash_markdown.py:191:40: RUF032 `for` loop variable dashboard is used outside of block
+ superset/sql_parse.py:1539:53: RUF032 `for` loop variable dialect is used outside of block
+ superset/tasks/cache.py:269:31: RUF032 `for` loop variable class_ is used outside of block
+ superset/tasks/cache.py:271:20: RUF032 `for` loop variable class_ is used outside of block
bokeh/bokeh (+13 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ src/bokeh/application/handlers/document_lifecycle.py:70:13: RUF032 `for` loop variable callback is used outside of block
+ src/bokeh/embed/util.py:276:27: RUF032 `for` loop variable elementid is used outside of block
+ src/bokeh/embed/util.py:276:38: RUF032 `for` loop variable root is used outside of block
+ src/bokeh/embed/util.py:276:47: RUF032 `for` loop variable root is used outside of block
+ src/bokeh/embed/util.py:276:58: RUF032 `for` loop variable root is used outside of block
+ src/bokeh/plotting/_figure.py:483:13: RUF032 `for` loop variable kw is used outside of block
+ src/bokeh/plotting/_figure.py:484:46: RUF032 `for` loop variable kw is used outside of block
+ src/bokeh/plotting/_figure.py:488:35: RUF032 `for` loop variable kw is used outside of block
+ src/bokeh/util/compiler.py:116:38: RUF032 `for` loop variable i is used outside of block
+ tests/codebase/test_code_quality.py:112:12: RUF032 `for` loop variable line is used outside of block
... 3 additional changes omitted for project
ibis-project/ibis (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ ibis/backends/bigquery/tests/unit/udf/test_core.py:181:16: RUF032 `for` loop variable i is used outside of block
milvus-io/pymilvus (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ examples/milvus_client/rbac.py:97:17: RUF032 `for` loop variable role is used outside of block
pandas-dev/pandas (+18 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ pandas/__init__.py:19:25: RUF032 `for` loop variable _dependency is used outside of block
+ pandas/io/common.py:599:8: RUF032 `for` loop variable compression is used outside of block
+ pandas/io/common.py:600:16: RUF032 `for` loop variable compression is used outside of block
+ pandas/io/common.py:604:43: RUF032 `for` loop variable compression is used outside of block
+ pandas/io/formats/excel.py:668:25: RUF032 `for` loop variable lnum is used outside of block
+ pandas/io/formats/excel.py:673:29: RUF032 `for` loop variable lnum is used outside of block
+ pandas/io/formats/excel.py:678:27: RUF032 `for` loop variable lnum is used outside of block
+ pandas/plotting/_matplotlib/boxplot.py:557:16: RUF032 `for` loop variable ax is used outside of block
+ pandas/tests/frame/indexing/test_take.py:48:13: RUF032 `for` loop variable df is used outside of block
+ pandas/tests/frame/indexing/test_take.py:50:13: RUF032 `for` loop variable df is used outside of block
... 8 additional changes omitted for project
python-poetry/poetry (+4 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ src/poetry/masonry/builders/editable.py:182:27: RUF032 `for` loop variable scripts_path is used outside of block
+ src/poetry/masonry/builders/editable.py:184:64: RUF032 `for` loop variable scripts_path is used outside of block
+ src/poetry/masonry/builders/editable.py:207:28: RUF032 `for` loop variable scripts_path is used outside of block
+ tests/utils/env/test_env.py:303:37: RUF032 `for` loop variable dist is used outside of block
rotki/rotki (+7 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ rotkehlchen/api/rest.py:559:49: RUF032 `for` loop variable idx is used outside of block
+ rotkehlchen/chain/zksync_lite/manager.py:828:25: RUF032 `for` loop variable target is used outside of block
+ rotkehlchen/tests/db/test_db_upgrades.py:2603:12: RUF032 `for` loop variable idx is used outside of block
+ rotkehlchen/tests/integration/test_zksynclite.py:108:25: RUF032 `for` loop variable idx is used outside of block
+ rotkehlchen/tests/integration/test_zksynclite.py:91:25: RUF032 `for` loop variable idx is used outside of block
+ rotkehlchen/tests/unit/globaldb/test_globaldb_consistency.py:329:75: RUF032 `for` loop variable key is used outside of block
+ rotkehlchen/tests/unit/globaldb/test_globaldb_consistency.py:329:84: RUF032 `for` loop variable packaged_field is used outside of block
zulip/zulip (+10 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ zerver/actions/streams.py:628:16: RUF032 `for` loop variable stream_id is used outside of block
+ zerver/actions/streams.py:631:71: RUF032 `for` loop variable stream_id is used outside of block
+ zerver/lib/markdown/__init__.py:1036:75: RUF032 `for` loop variable size_name is used outside of block
+ zerver/tests/test_auth_backends.py:239:32: RUF032 `for` loop variable backend_name is used outside of block
+ zerver/tests/test_auth_backends.py:256:32: RUF032 `for` loop variable backend_name is used outside of block
+ zerver/tests/test_message_send.py:1686:30: RUF032 `for` loop variable user is used outside of block
+ zerver/tests/test_message_send.py:1699:30: RUF032 `for` loop variable user is used outside of block
+ zerver/tests/test_user_topics.py:124:44: RUF032 `for` loop variable data is used outside of block
+ zerver/tests/test_user_topics.py:129:9: RUF032 `for` loop variable data is used outside of block
+ zerver/tests/test_user_topics.py:131:48: RUF032 `for` loop variable data is used outside of block
indico/indico (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ bin/utils/storage_checksums.py:43:12: RUF032 `for` loop variable row is used outside of block
+ bin/utils/storage_checksums.py:46:57: RUF032 `for` loop variable row is used outside of block
python-trio/trio (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ notes-to-self/socketpair-buffering.py:35:44: RUF032 `for` loop variable _count is used outside of block
+ src/trio/_tests/check_type_completeness.py:106:90: RUF032 `for` loop variable obj_name is used outside of block
pytest-dev/pytest (+20 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ src/_pytest/assertion/rewrite.py:492:40: RUF032 `for` loop variable i is used outside of block
+ src/_pytest/assertion/rewrite.py:492:53: RUF032 `for` loop variable i is used outside of block
+ src/_pytest/assertion/rewrite.py:492:66: RUF032 `for` loop variable i is used outside of block
+ src/_pytest/assertion/rewrite.py:495:12: RUF032 `for` loop variable expl is used outside of block
+ src/_pytest/assertion/rewrite.py:713:23: RUF032 `for` loop variable item is used outside of block
+ src/_pytest/assertion/rewrite.py:713:50: RUF032 `for` loop variable item is used outside of block
+ src/_pytest/assertion/rewrite.py:714:22: RUF032 `for` loop variable item is used outside of block
+ src/_pytest/assertion/rewrite.py:716:22: RUF032 `for` loop variable item is used outside of block
+ src/_pytest/assertion/truncate.py:111:37: RUF032 `for` loop variable iterated_index is used outside of block
+ src/_pytest/assertion/truncate.py:112:30: RUF032 `for` loop variable iterated_index is used outside of block
... 10 additional changes omitted for project

... Truncated remaining completed project reports due to GitHub comment length restrictions

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
RUF032 117 117 0 0 0

github-actions[bot] avatar Jun 06 '24 01:06 github-actions[bot]

We discussed this internally and we agreed on:

  • Using the for introduced variable outside of a loop is likely a bug. There are cases where this is intentional, but that's when using suppressions is fine.
  • Using the with introduced variable after the with block is more common and often acceptable. @AlexWaygood would you mind sharing an example?
  • There's a similar problem related to variables introduced by match case
    import random
    
    match random.choice([True, False]):
        case True: False
        case x:
            y = 2
    
    
    print(x)
    

The fact that for is most likely an error and with is not justifies that this rule should be split into two rules. I suggest that we start with the for loop and add it to the RUF ruleset.

MichaReiser avatar Jul 29 '24 06:07 MichaReiser

Using the with introduced variable after the with block is more common and often acceptable. @AlexWaygood would you mind sharing an example?

A commit merged to the CPython repo earlier this morning has several examples in it: https://github.com/python/cpython/commit/0697188084bf61b28f258fbbe867e1010d679b3e.

Here's a slightly simpler example than any of the functions that commit touched, from CPython's dataclasses test suite: https://github.com/python/cpython/blob/9187484dd97f6beb94fc17676014706922e380e1/Lib/test/test_dataclasses/init.py#L3121C1-L3138C68:

import unittest
from dataclasses import dataclass

class FrozenTests(unittest.TestCase):
    def test_non_frozen_normal_derived_from_empty_frozen(self):
        @dataclass(frozen=True)
        class D:
            pass

        class S(D):
            pass

        s = S()
        self.assertFalse(hasattr(s, 'x'))
        s.x = 5
        self.assertEqual(s.x, 5)

        del s.x
        self.assertFalse(hasattr(s, 'x'))
        with self.assertRaises(AttributeError) as cm:
            del s.x
        self.assertNotIsInstance(cm.exception, FrozenInstanceError)

The with self.assertRaises block is kept as small as possible, because the author of the test wanted to assert that the AttributeError exception is raised on exactly one line: the del s.x call. If an AttributeError was raised on another line, that would indicate a bug in the test or in the library; that should fail. So the with block is only one line long; but you want to save the cm binding that's created in the with statement so that you can do some extra assertions on it after the with block has ended.

I think tests are the places where this comes up most often. You also see this pattern a lot with pytest-style tests that use with pytest.raises() context managers; it's not just unittest-style tests.

AlexWaygood avatar Jul 29 '24 13:07 AlexWaygood

I suggest that we start with the for loop and add it to the RUF ruleset.

Updated :white_check_mark:

MadLittleMods avatar Jul 30 '24 04:07 MadLittleMods

CodSpeed Performance Report

Merging #11769 will degrade performances by 5.26%

Comparing MadLittleMods:madlittlemods/control-var-used-after-block (9bcc4fe) with main (c906b01)

Summary

❌ 1 regression
✅ 31 untouched

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Simulation linter/all-rules[numpy/globals.py] 726.2 µs 766.5 µs -5.26%

codspeed-hq[bot] avatar Aug 02 '24 09:08 codspeed-hq[bot]

Some interesting cases from the ecosystem check

Variable declared outside the loop

I think the code here is correct because the variable is first defined outside the loop as being None.

https://github.com/indico/indico/blob/a22e0c0a9c0cf4d40d04029489dc530b4819d2ea/bin/utils/storage_checksums.py#L39-L43

https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/io/common.py#L581-L599

(many more)

An easy fix here could be to skip the rule if a binding for the same name was already defined before the for loop. But it may also be intentional that we catch this as part of this rule. What do you think?

Use in combination with try..except

It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with try..except

https://github.com/python-trio/trio/blob/c9baca6045d427fd4cc0de2ca396ddeb4a0f40a5/src/trio/_tests/check_type_completeness.py#L62-L109

Multiple usages in the same statement

It would be great if we only flagged each variable once (at least per statement?)

**Delete statements ** This is interesting. We may explicitly want to allow usages in delete statements

https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/init.py#L19

Control flow

This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis

https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/plotting/_matplotlib/boxplot.py#L520-L561 https://github.com/bokeh/bokeh/blob/829b2a75c402d0d0abd7e37ff201fbdfd949d857/src/bokeh/embed/util.py#L266-L276 https://github.com/bokeh/bokeh/blob/829b2a75c402d0d0abd7e37ff201fbdfd949d857/src/bokeh/plotting/_figure.py#L476-L486

MichaReiser avatar Aug 09 '24 08:08 MichaReiser