ruff icon indicating copy to clipboard operation
ruff copied to clipboard

[`flake8-simplify`] Make `SIM114` autofix unsafe when branches have different comments

Open phongddo opened this issue 1 month ago • 4 comments

Summary

Fixes https://github.com/astral-sh/ruff/issues/19576

Improves SIM114 rule to handle branch merging with comment preservation safely, offer unsafe fixes when branches contain different comments to prevent comment loss.

Problem

The original SIM114 autofix has an issue where it wasn't detecting trailing comments because it was checking comments within body_range starting from the end of line rather than the end of test as documented.

For example, this case was incorrectly handled:

if x == 1: # 1
    print("Hello")
elif x == 2: # 2
    print("Hello")

Running with --fix would produce:

-if x == 1: # 1
-    print("Hello")
-elif x == 2: # 2
+if x == 1 or x == 2: # 1
     print("Hello")

Would fix 1 error.

It offers a safe fix even though the comments are not identical. I believe this is exactly the case mentioned in the reported issue, so in this situation we should offer unsafe fixes instead. In my opinion, this is a good strategy and consistent with how other rules are handled as @ntBre mentioned in other PR linked to the issue (please let me know what you think).

Solution

  • Fixed body_range to start from the end of the test (as it's documented) instead of the end of the test's line, allowing detection of trailing comments and validation against comments in the other branch before merging branches.
  • Modified merge_branches to offer an unsafe fix when the code block is identical but comments differ, instead of skipping the merge entirely. Otherwise, a safe fix is offered as is.
  • Updated rule documentation to include Fix Safety section to explain unsafe fix in this scenario.

Manual Verification

✏️ Those ecosystem reports are about unsafe fixes that we offer when merging branches with different comments.

The above test case → unsafe fix

SIM114 Combine `if` branches using logical `or` operator
  |
1 | / if x == 1: # 1
2 | |     print("Hello")
3 | | elif x == 2: # 2
4 | |     print("Hello")
  | |__________________^
  |
help: Combine `if` branches

Found 1 error.
No fixes available (1 hidden fix can be enabled with the `--unsafe-fixes` option).

The issue's test case → unsafe fix

SIM114 Combine `if` branches using logical `or` operator
   |
 1 | / if isinstance(exc, HTTPError) and (
 2 | |     (500 <= exc.status <= 599)
 3 | |     or exc.status == 408  # server errors
 4 | |     or exc.status == 429  # request timeout
 5 | |     ):  # too many requests
 6 | |         return True
 7 | |
 8 | | # Consider all SSL errors as temporary. There are a lot of bug
 9 | | # reports from people where various SSL errors cause a crash
10 | | # but are actually just temporary. On the other hand, we have
11 | | # no information if this ever revealed a problem where retrying
12 | | # was not the right choice.
13 | | elif isinstance(exc, ssl.SSLError):
14 | |     return True
   | |_______________^
   |
help: Combine `if` branches

Found 1 error.
No fixes available (1 hidden fix can be enabled with the `--unsafe-fixes` option).

No comments test case → safe fix

SIM114 [*] Combine `if` branches using logical `or` operator
  |
1 | / if x == 1:
2 | |     print("Hello")
3 | | elif x == 2:
4 | |     print("Hello")
  | |__________________^
  |
help: Combine `if` branches

Found 1 error.
[*] 1 fixable with the `--fix` option.

Test Plan

  • Added a comment in snapshot test. I'm not sure how we can distinguish between a safe and an unsafe fix in the snapshot test, so please guide me 🙏

phongddo avatar Dec 19 '25 12:12 phongddo

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+1 -1 violations, +0 -0 fixes in 2 projects; 53 projects unchanged)

binary-husky/gpt_academic (+0 -1 violations, +0 -0 fixes)

- main.py:164:33: SIM114 [*] Combine `if` branches using logical `or` operator
pypa/cibuildwheel (+1 -0 violations, +0 -0 fixes)

+ test/utils.py:268:62: RUF100 [*] Unused `noqa` directive (unused: `SIM114`)
Changes by rule (2 rules affected)

code total + violation - violation + fix - fix
RUF100 1 1 0 0 0
SIM114 1 0 1 0 0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+1 -1 violations, +0 -0 fixes in 2 projects; 53 projects unchanged)

binary-husky/gpt_academic (+0 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview

- main.py:164:33: SIM114 [*] Combine `if` branches using logical `or` operator
pypa/cibuildwheel (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview

+ test/utils.py:268:62: RUF100 [*] Unused `noqa` directive (unused: `SIM114`)
Changes by rule (2 rules affected)

code total + violation - violation + fix - fix
RUF100 1 1 0 0 0
SIM114 1 0 1 0 0

astral-sh-bot[bot] avatar Dec 19 '25 12:12 astral-sh-bot[bot]

Hey @ntBre 👋 I've got this PR up with a suggestion to address the issue https://github.com/astral-sh/ruff/issues/19576. Please take a look whenever you have a moment. Thank you 😄

phongddo avatar Dec 19 '25 12:12 phongddo

Could you take a look at the ecosystem check too? I don't think we should be changing when the rule applies, only the safety of its fix, but it looks like there are new diagnostics showing up.

ntBre avatar Dec 19 '25 17:12 ntBre

I still need to review more closely

Sure, please take your time 🙏

Could you take a look at the ecosystem check too? I don't think we should be changing when the rule applies, only the safety of its fix, but it looks like there are new diagnostics showing up.

I've looked into those reports, so please correct me if I'm wrong, but I think those changes make sense since we now offer an unsafe fix when merging branches with different comments. Previously, we didn't merge branches at all and simply returned without any fix (this check). However, we ended up merging (without an unsafe fix) in the trailing comment scenario (the below example) due to the incorrect way we construct the body range that I fixed here, which I described in the PR's description:

if x == 1: # 1
    print("Hello")
elif x == 2: # 2
    print("Hello")

# Snapshot test case
if a:  # we preserve comments, too!
    b
elif c:  # but not on the second branch
    b

With the fix, we now correctly check comment equality and offer an unsafe fix instead of ignoring it. For example, this report from ecosystem

+ [libs/core/langchain_core/messages/block_translators/openai.py:76:13:](https://github.com/langchain-ai/langchain/blob/795e746ca70771925b65f9d3b1f551745a8316e7/libs/core/langchain_core/messages/block_translators/openai.py#L76) SIM114 Combine `if` branches using logical `or` operator

Previous → All checks passed!

After

SIM114 Combine `if` branches using logical `or` operator
  |
1 |   if filename := block.get("filename"):
2 |       file["filename"] = filename
3 | / elif (extras := block.get("extras")) and ("filename" in extras):
4 | |     file["filename"] = extras["filename"]
5 | | elif (extras := block.get("metadata")) and ("filename" in extras):
6 | |     # Backward compat
7 | |     file["filename"] = extras["filename"]
  | |_________________________________________^
  |
help: Combine `if` branches

Found 1 error.
No fixes available (1 hidden fix can be enabled with the `--unsafe-fixes` option).

In my opinion, providing an unsafe fix in this situation feels more consistent with other rules like ASYNC115. That said, please let me know if you disagree, I'm fine to revert and keep the previous behavior (by reverting the comments equality check, and it only returns safe fix) 😄

phongddo avatar Dec 19 '25 22:12 phongddo

I think there are two different kinds of comments at play here. I think I agree with the comments in this test case:

https://github.com/astral-sh/ruff/blob/adef89eb7cfb6eb7c3e0646dcd120e6f1adc7af2/crates/ruff_linter/resources/test/fixtures/flake8_simplify/SIM114.py#L107-L112

that such comments should disable the rule. This seems like a clear signal to me that the user wants the two blocks of code to be separate, even if the code in the body happens to be the same.

In contrast, this case:

https://github.com/astral-sh/ruff/blob/e3498121b4700f5a9dce9bf6d72af4bc411e262d/crates/ruff_linter/resources/test/fixtures/flake8_simplify/SIM114.py#L138-L141

seems like a known bug to me, recorded in our test cases. Removing this second comment should make the fix unsafe.

As I commented on the first PR (https://github.com/astral-sh/ruff/pull/19759#pullrequestreview-3112435910), I think one easy fix here would be to check if the replacement range contains any comments and mark the fix as unsafe if it does, separate from the first_comments.eq(second_comments) check.

However, I also kind of feel like the case in the issue, something like this:

def foo():
    if isinstance(exc, HTTPError) and (
		(500 <= exc.status <= 599)
		or exc.status == 408  # server errors
		or exc.status == 429  # request timeout
	):  # too many requests
        return True

	# Consider all SSL errors as temporary. There are a lot of bug
	# reports from people where various SSL errors cause a crash
	# but are actually just temporary. On the other hand, we have
	# no information if this ever revealed a problem where retrying
	# was not the right choice.
    elif isinstance(exc, ssl.SSLError):
        return True

falls pretty clearly into the first category. If I wrote this comment, I would not want an autofix to delete it, and as a result, I probably wouldn't want the rule to fire at all. So another bug fix here would be to associate that comment with one of the bodies so that the rule doesn't fire.

Overall, I think we should just avoid a diagnostic and fix entirely in both of these cases to avoid trying to differentiate between comments it's okay to delete and those it's not.

Whatever we end up doing, I think we need to add a test case like the sample from the issue because that comment placement seems a bit different from any of the other tests we have.

ntBre avatar Dec 24 '25 16:12 ntBre

Overall, I think we should just avoid a diagnostic and fix entirely in both of these cases to avoid trying to differentiate between comments it's okay to delete and those it's not.

I agree 👍 avoid a diagnostic and fix makes sense to me!

phongddo avatar Dec 24 '25 18:12 phongddo

@ntBre I reverted the unsafe fix and kept only the body_range fix here. This change would skip the diagnostic and fix entirely where branch comments differ. I also added a test case similar to the one mentioned in the issue (and adjust comments in other test cases). Please let me know what you think 😄

phongddo avatar Dec 24 '25 18:12 phongddo

Thank you!

Oh no, I think I see why end-of-line comments might have been treated separately :grimacing: There's a newly removed RUF100 unused noqa comment diagnostic in the ecosystem report:

if platform == "pyodide" and python_abi_tags is None:
    python_abi_tags = [
        "cp312-cp312",
        "cp313-cp313",
    ]
elif platform == "android" and python_abi_tags is None:  # noqa: SIM114
    python_abi_tags = [
        "cp313-cp313",
        "cp314-cp314",
    ]
elif platform == "ios" and python_abi_tags is None:
    python_abi_tags = [
        "cp313-cp313",
        "cp314-cp314",
    ]

I think this comment is now preventing SIM114 from firing, making it "unused" as a noqa comment, but removing it would make SIM114 trigger again. Apparently this has come up before, but I didn't realize it (https://github.com/astral-sh/ruff/issues/6025). Let's make sure to add a comment about this somewhere, it's a bit surprising.

I think we can fix this if we keep the diagnostic for removing end-of-line comments but make the fix unsafe, but suppress the diagnostic for comments like the ones in the issue. Is that doable?

Sorry for the back and forth, but I keep noticing new things in the ecosystem check!

ntBre avatar Dec 25 '25 15:12 ntBre

I think we can fix this if we keep the diagnostic for removing end-of-line comments but make the fix unsafe, but suppress the diagnostic for comments like the ones in the issue. Is that doable?

Could we detect if any of the comments are a pragma comment instead and not emit a diagnostic in that case?

MichaReiser avatar Dec 26 '25 08:12 MichaReiser

Could we detect if any of the comments are a pragma comment instead and not emit a diagnostic in that case?

Ah that's a good idea, I'd prefer not to emit diagnostics for the end-of-line comments otherwise too.

(We actually need to emit a diagnostic only if it's a pragma comment so that the pragma isn't unused, but I definitely agree with treating them specially)

ntBre avatar Dec 26 '25 15:12 ntBre