ruff
ruff copied to clipboard
[`wemake_python_styleguide`] Implement `ControlVarUsedAfterBlockViolation` (`WPS441`)
Summary
Implement ControlVarUsedAfterBlockViolation (WPS441). This rule prevents you from using for-loop variables outside of the loop block (or a with context variable).
Example:
# For control var used outside block
for event in []:
_ = event
pass
# ❌
_ = event
# with statement
with None as w:
_ = w
pass
# ❌
_ = w
Part of https://github.com/astral-sh/ruff/issues/3845
This is spawning from a real-world pain where I accidentally used one for-loop control variable in another loop, see https://github.com/element-hq/synapse/pull/17187#discussion_r1612211662
Please bear in mind that this is my first Rust code so I'm probably doing a few things wrong :bow:. Shout out to @reivilibre who helped me in the beginning!
Test Plan
Tested against the fixture file:
cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py --no-cache --select WPS441
Also tested against the https://github.com/element-hq/synapse codebase. It does trigger in several spots that work because Python works this way but could be refactored to align with this rule. Some of the usages seem slightly sketchy but it requires some more context/time to determine whether they are bugs in the Synapse code.
Here is a real world bug that it caught: https://github.com/element-hq/synapse/pull/17272
Dev notes
Running ruff against your test file with your new rule:
cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py --no-cache --select WPS441
Getting started
- How to get the project running ->
CONTRIBUTING.md#prerequisites - How to add a new lint rule ->
CONTRIBUTING.md#example-adding-a-new-lint-rule
Running the pre-commit hooks without installing to your system:
python -m venv /home/eric/Downloads/ruff-venv
source /home/eric/Downloads/ruff-venv/bin/activate
pip install pre-commit
Debug NodeId/node_id
Add the following method to crates/ruff_python_semantic/src/model.rs and use println!("nodes {:#?}", checker.semantic().nodes());. It would be really nice if the node index/ID was printed next to each one but you can manually number things to get by.
#[inline]
pub fn nodes(&self) -> &Nodes<'a> {
&self.nodes
}
Inspect the AST
cargo dev print-ast crates/ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py
Helpful related ruff rules
unused_variableredefined_loop_nametoo_many_nested_blocks
ruff-ecosystem results
Linter (stable)
✅ ecosystem check detected no linter changes.
Linter (preview)
ℹ️ ecosystem check detected linter changes. (+117 -0 violations, +0 -0 fixes in 15 projects; 39 projects unchanged)
DisnakeDev/disnake (+3 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ disnake/message.py:1104:28: RUF032 `for` loop variable index is used outside of block + disnake/message.py:1105:16: RUF032 `for` loop variable reaction is used outside of block + scripts/codemods/typed_permissions.py:84:33: RUF032 `for` loop variable b is used outside of block
RasaHQ/rasa (+2 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ tests/shared/nlu/training_data/test_features.py:234:46: RUF032 `for` loop variable idx is used outside of block + tests/shared/nlu/training_data/test_features.py:234:67: RUF032 `for` loop variable idx is used outside of block
apache/airflow (+25 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL
+ airflow/api/__init__.py:46:76: RUF032 `for` loop variable backend is used outside of block + airflow/models/param.py:284:67: RUF032 `for` loop variable k is used outside of block + airflow/providers/amazon/aws/sensors/s3.py:150:64: RUF032 `for` loop variable key is used outside of block + airflow/providers/amazon/aws/sensors/s3.py:154:41: RUF032 `for` loop variable key is used outside of block + airflow/providers/amazon/aws/sensors/s3.py:159:50: RUF032 `for` loop variable key is used outside of block + airflow/providers/apache/hive/hooks/hive.py:1015:59: RUF032 `for` loop variable i is used outside of block + airflow/providers/cncf/kubernetes/utils/pod_manager.py:476:60: RUF032 `for` loop variable line is used outside of block + airflow/providers/cncf/kubernetes/utils/pod_manager.py:479:60: RUF032 `for` loop variable line is used outside of block + airflow/providers/docker/operators/docker_swarm.py:221:32: RUF032 `for` loop variable line is used outside of block + airflow/timetables/events.py:103:47: RUF032 `for` loop variable next_event is used outside of block ... 15 additional changes omitted for project
apache/superset (+8 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL
+ superset/cli/test_db.py:235:8: RUF032 `for` loop variable spec is used outside of block + superset/cli/test_db.py:238:21: RUF032 `for` loop variable spec is used outside of block + superset/cli/test_db.py:269:12: RUF032 `for` loop variable spec is used outside of block + superset/migrations/versions/2017-10-03_14-37_4736ec66ce19_.py:204:19: RUF032 `for` loop variable foreign is used outside of block + superset/migrations/versions/2020-08-12_00-24_978245563a02_migrate_iframe_to_dash_markdown.py:191:40: RUF032 `for` loop variable dashboard is used outside of block + superset/sql_parse.py:1539:53: RUF032 `for` loop variable dialect is used outside of block + superset/tasks/cache.py:269:31: RUF032 `for` loop variable class_ is used outside of block + superset/tasks/cache.py:271:20: RUF032 `for` loop variable class_ is used outside of block
bokeh/bokeh (+13 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL
+ src/bokeh/application/handlers/document_lifecycle.py:70:13: RUF032 `for` loop variable callback is used outside of block + src/bokeh/embed/util.py:276:27: RUF032 `for` loop variable elementid is used outside of block + src/bokeh/embed/util.py:276:38: RUF032 `for` loop variable root is used outside of block + src/bokeh/embed/util.py:276:47: RUF032 `for` loop variable root is used outside of block + src/bokeh/embed/util.py:276:58: RUF032 `for` loop variable root is used outside of block + src/bokeh/plotting/_figure.py:483:13: RUF032 `for` loop variable kw is used outside of block + src/bokeh/plotting/_figure.py:484:46: RUF032 `for` loop variable kw is used outside of block + src/bokeh/plotting/_figure.py:488:35: RUF032 `for` loop variable kw is used outside of block + src/bokeh/util/compiler.py:116:38: RUF032 `for` loop variable i is used outside of block + tests/codebase/test_code_quality.py:112:12: RUF032 `for` loop variable line is used outside of block ... 3 additional changes omitted for project
ibis-project/ibis (+1 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ ibis/backends/bigquery/tests/unit/udf/test_core.py:181:16: RUF032 `for` loop variable i is used outside of block
milvus-io/pymilvus (+1 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ examples/milvus_client/rbac.py:97:17: RUF032 `for` loop variable role is used outside of block
pandas-dev/pandas (+18 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ pandas/__init__.py:19:25: RUF032 `for` loop variable _dependency is used outside of block + pandas/io/common.py:599:8: RUF032 `for` loop variable compression is used outside of block + pandas/io/common.py:600:16: RUF032 `for` loop variable compression is used outside of block + pandas/io/common.py:604:43: RUF032 `for` loop variable compression is used outside of block + pandas/io/formats/excel.py:668:25: RUF032 `for` loop variable lnum is used outside of block + pandas/io/formats/excel.py:673:29: RUF032 `for` loop variable lnum is used outside of block + pandas/io/formats/excel.py:678:27: RUF032 `for` loop variable lnum is used outside of block + pandas/plotting/_matplotlib/boxplot.py:557:16: RUF032 `for` loop variable ax is used outside of block + pandas/tests/frame/indexing/test_take.py:48:13: RUF032 `for` loop variable df is used outside of block + pandas/tests/frame/indexing/test_take.py:50:13: RUF032 `for` loop variable df is used outside of block ... 8 additional changes omitted for project
python-poetry/poetry (+4 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ src/poetry/masonry/builders/editable.py:182:27: RUF032 `for` loop variable scripts_path is used outside of block + src/poetry/masonry/builders/editable.py:184:64: RUF032 `for` loop variable scripts_path is used outside of block + src/poetry/masonry/builders/editable.py:207:28: RUF032 `for` loop variable scripts_path is used outside of block + tests/utils/env/test_env.py:303:37: RUF032 `for` loop variable dist is used outside of block
rotki/rotki (+7 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ rotkehlchen/api/rest.py:559:49: RUF032 `for` loop variable idx is used outside of block + rotkehlchen/chain/zksync_lite/manager.py:828:25: RUF032 `for` loop variable target is used outside of block + rotkehlchen/tests/db/test_db_upgrades.py:2603:12: RUF032 `for` loop variable idx is used outside of block + rotkehlchen/tests/integration/test_zksynclite.py:108:25: RUF032 `for` loop variable idx is used outside of block + rotkehlchen/tests/integration/test_zksynclite.py:91:25: RUF032 `for` loop variable idx is used outside of block + rotkehlchen/tests/unit/globaldb/test_globaldb_consistency.py:329:75: RUF032 `for` loop variable key is used outside of block + rotkehlchen/tests/unit/globaldb/test_globaldb_consistency.py:329:84: RUF032 `for` loop variable packaged_field is used outside of block
zulip/zulip (+10 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL
+ zerver/actions/streams.py:628:16: RUF032 `for` loop variable stream_id is used outside of block + zerver/actions/streams.py:631:71: RUF032 `for` loop variable stream_id is used outside of block + zerver/lib/markdown/__init__.py:1036:75: RUF032 `for` loop variable size_name is used outside of block + zerver/tests/test_auth_backends.py:239:32: RUF032 `for` loop variable backend_name is used outside of block + zerver/tests/test_auth_backends.py:256:32: RUF032 `for` loop variable backend_name is used outside of block + zerver/tests/test_message_send.py:1686:30: RUF032 `for` loop variable user is used outside of block + zerver/tests/test_message_send.py:1699:30: RUF032 `for` loop variable user is used outside of block + zerver/tests/test_user_topics.py:124:44: RUF032 `for` loop variable data is used outside of block + zerver/tests/test_user_topics.py:129:9: RUF032 `for` loop variable data is used outside of block + zerver/tests/test_user_topics.py:131:48: RUF032 `for` loop variable data is used outside of block
indico/indico (+2 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ bin/utils/storage_checksums.py:43:12: RUF032 `for` loop variable row is used outside of block + bin/utils/storage_checksums.py:46:57: RUF032 `for` loop variable row is used outside of block
python-trio/trio (+2 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ notes-to-self/socketpair-buffering.py:35:44: RUF032 `for` loop variable _count is used outside of block + src/trio/_tests/check_type_completeness.py:106:90: RUF032 `for` loop variable obj_name is used outside of block
pytest-dev/pytest (+20 -0 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ src/_pytest/assertion/rewrite.py:492:40: RUF032 `for` loop variable i is used outside of block + src/_pytest/assertion/rewrite.py:492:53: RUF032 `for` loop variable i is used outside of block + src/_pytest/assertion/rewrite.py:492:66: RUF032 `for` loop variable i is used outside of block + src/_pytest/assertion/rewrite.py:495:12: RUF032 `for` loop variable expl is used outside of block + src/_pytest/assertion/rewrite.py:713:23: RUF032 `for` loop variable item is used outside of block + src/_pytest/assertion/rewrite.py:713:50: RUF032 `for` loop variable item is used outside of block + src/_pytest/assertion/rewrite.py:714:22: RUF032 `for` loop variable item is used outside of block + src/_pytest/assertion/rewrite.py:716:22: RUF032 `for` loop variable item is used outside of block + src/_pytest/assertion/truncate.py:111:37: RUF032 `for` loop variable iterated_index is used outside of block + src/_pytest/assertion/truncate.py:112:30: RUF032 `for` loop variable iterated_index is used outside of block ... 10 additional changes omitted for project
... Truncated remaining completed project reports due to GitHub comment length restrictions
Changes by rule (1 rules affected)
| code | total | + violation | - violation | + fix | - fix |
|---|---|---|---|---|---|
| RUF032 | 117 | 117 | 0 | 0 | 0 |
We discussed this internally and we agreed on:
- Using the
forintroduced variable outside of a loop is likely a bug. There are cases where this is intentional, but that's when using suppressions is fine. - Using the
withintroduced variable after thewithblock is more common and often acceptable. @AlexWaygood would you mind sharing an example? - There's a similar problem related to variables introduced by
match caseimport random match random.choice([True, False]): case True: False case x: y = 2 print(x)
The fact that for is most likely an error and with is not justifies that this rule should be split into two rules. I suggest that we start with the for loop and add it to the RUF ruleset.
Using the
withintroduced variable after thewithblock is more common and often acceptable. @AlexWaygood would you mind sharing an example?
A commit merged to the CPython repo earlier this morning has several examples in it: https://github.com/python/cpython/commit/0697188084bf61b28f258fbbe867e1010d679b3e.
Here's a slightly simpler example than any of the functions that commit touched, from CPython's dataclasses test suite: https://github.com/python/cpython/blob/9187484dd97f6beb94fc17676014706922e380e1/Lib/test/test_dataclasses/init.py#L3121C1-L3138C68:
import unittest
from dataclasses import dataclass
class FrozenTests(unittest.TestCase):
def test_non_frozen_normal_derived_from_empty_frozen(self):
@dataclass(frozen=True)
class D:
pass
class S(D):
pass
s = S()
self.assertFalse(hasattr(s, 'x'))
s.x = 5
self.assertEqual(s.x, 5)
del s.x
self.assertFalse(hasattr(s, 'x'))
with self.assertRaises(AttributeError) as cm:
del s.x
self.assertNotIsInstance(cm.exception, FrozenInstanceError)
The with self.assertRaises block is kept as small as possible, because the author of the test wanted to assert that the AttributeError exception is raised on exactly one line: the del s.x call. If an AttributeError was raised on another line, that would indicate a bug in the test or in the library; that should fail. So the with block is only one line long; but you want to save the cm binding that's created in the with statement so that you can do some extra assertions on it after the with block has ended.
I think tests are the places where this comes up most often. You also see this pattern a lot with pytest-style tests that use with pytest.raises() context managers; it's not just unittest-style tests.
I suggest that we start with the
forloop and add it to theRUFruleset.
Updated :white_check_mark:
CodSpeed Performance Report
Merging #11769 will degrade performances by 5.26%
Comparing MadLittleMods:madlittlemods/control-var-used-after-block (9bcc4fe) with main (c906b01)
Summary
❌ 1 regression
✅ 31 untouched
:warning: Please fix the performance issues or acknowledge them on CodSpeed.
Benchmarks breakdown
| Mode | Benchmark | BASE |
HEAD |
Change | |
|---|---|---|---|---|---|
| ❌ | Simulation | linter/all-rules[numpy/globals.py] |
726.2 µs | 766.5 µs | -5.26% |
Some interesting cases from the ecosystem check
Variable declared outside the loop
I think the code here is correct because the variable is first defined outside the loop as being None.
https://github.com/indico/indico/blob/a22e0c0a9c0cf4d40d04029489dc530b4819d2ea/bin/utils/storage_checksums.py#L39-L43
https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/io/common.py#L581-L599
(many more)
An easy fix here could be to skip the rule if a binding for the same name was already defined before the for loop. But it may also be intentional that we catch this as part of this rule. What do you think?
Use in combination with try..except
It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with try..except
https://github.com/python-trio/trio/blob/c9baca6045d427fd4cc0de2ca396ddeb4a0f40a5/src/trio/_tests/check_type_completeness.py#L62-L109
Multiple usages in the same statement
It would be great if we only flagged each variable once (at least per statement?)
- src/_pytest/assertion/rewrite.py:492:40: RUF032
forloop variable i is used outside of block - src/_pytest/assertion/rewrite.py:492:53: RUF032
forloop variable i is used outside of block - src/_pytest/assertion/rewrite.py:492:66: RUF032
forloop variable i is used outside of block
**Delete statements ** This is interesting. We may explicitly want to allow usages in delete statements
https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/init.py#L19
Control flow
This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis
https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/plotting/_matplotlib/boxplot.py#L520-L561 https://github.com/bokeh/bokeh/blob/829b2a75c402d0d0abd7e37ff201fbdfd949d857/src/bokeh/embed/util.py#L266-L276 https://github.com/bokeh/bokeh/blob/829b2a75c402d0d0abd7e37ff201fbdfd949d857/src/bokeh/plotting/_figure.py#L476-L486