ruff icon indicating copy to clipboard operation
ruff copied to clipboard

[flake8-type-checking] Adds implementation for TCH007 and TCH008

Open Daverball opened this issue 1 year ago • 9 comments

This is part of a series of pull requests fulfilling my promise in #9573 to port some of the enhancements and new rules from flake8-type-checking to ruff.

Summary

This adds the missing flake8-type-checking rules TC(H)007 and TC(H)008 including auto fixes.

There is some overlap between TC(H)007 and TC(H)004, currently the existing rule takes precedence, although ideally we would still emit both violations and share a fix for overlaps based on the selected strategy (if it is possible for violations of two different rules to share the same fix). We could probably move the analysis for TC(H007) into the same function as TC(H)004 in order to accomplish that. But we could defer that to a future pull request.

There is also some potential overlap between TC(H)008 and TC(H)010. Currently TC(H)010 is prioritized, but since it currently has no fix it might make more sense to either prioritize TC(H)008 or add a fix for TC(H)010, although that could be covered by a future pull request.

Test Plan

cargo nextest run

Daverball avatar Aug 16 '24 11:08 Daverball

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+6 -0 violations, +0 -0 fixes in 2 projects; 53 projects unchanged)

latchbio/latch (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ src/latch_cli/snakemake/config/utils.py:13:33: TC008 [*] Remove quotes from type alias
zulip/zulip (+5 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ analytics/views/stats.py:342:14: TC008 [*] Remove quotes from type alias
+ analytics/views/stats.py:344:16: TC008 [*] Remove quotes from type alias
+ confirmation/models.py:76:5: TC008 [*] Remove quotes from type alias
+ confirmation/models.py:77:5: TC008 [*] Remove quotes from type alias
+ zerver/lib/push_notifications.py:75:49: TC008 [*] Remove quotes from type alias
Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
TC008 6 6 0 0 0

github-actions[bot] avatar Aug 16 '24 11:08 github-actions[bot]

Looks like forward references to symbols that occur later in the code are not handled correctly yet. Is there already a standard way to perform a binding lookup in a runtime only context? Or will i have to manually compare the TextRange of the binding against the TextRange of the type alias' value expression?

Edit: I ended up doing the TextRange comparison for now.

Daverball avatar Aug 16 '24 12:08 Daverball

CodSpeed Performance Report

Merging #12927 will not alter performance

Comparing Daverball:feat/tch007-tch008 (8cf8f27) with main (e0f3eaf)

Summary

✅ 32 untouched benchmarks

codspeed-hq[bot] avatar Aug 16 '24 13:08 codspeed-hq[bot]

I'm struggling to implement an accurate runtime binding lookup since in addition to retrieving an accurate text range for the definition I need to special case class scopes:

class A:
    x: TypeAlias = 'A'  # fine
    class B:
        x: TypeAlias = 'A' # fine
        y: TypeAlias = 'B' # fine

None of these should trigger TCH008 and the unwrapped case should trigger TCH007. I could port the logic from flake8-type-checking but that means recursively traversing the scopes upwards and checking each binding in every scope until we find one that is available at runtime, i.e. skipping any symbols that occur lexicographically after the reference and also skipping the class binding of the class scope(s) we started in. In addition for AnnAssign we need to consider the TextRange of the statement, rather than the name to get correct lexicographical lookup behavior. Similar to parent_range for import bindings.

It might be sufficient to add a hash table of scope ids to a vector of binding ids for bindings that violate the lexicographical order for name lookups within that scope.

So the scope of A would contain a reference to the binding of A and the scope of B would contain references to the bindings of A and B. So if our linked binding happens to be in that list, then we can assume the lookup will result in a NameError at runtime.

Daverball avatar Aug 16 '24 15:08 Daverball

Looks like what I want is lookup_symbol but I will need to modify the state before the call so the function no longer thinks we're inside a forward reference. That will make things a lot more simple again.

Edit: Nevermind, it doesn't look like it takes lexicographical order into account and instead relies on being called at the right time during analysis, which puts me into a difficult spot, since if I do it at the right time, I will be forced to parse the type expression a second time before all the symbols can be resolved and if I do it at the wrong time I will have to jump through additional hoops to make the lookup behave correctly. Not sure what's better at this point. Instead of doing a full AST parse I could however do what I do in flake8-type-checking and use a simple Regex parser that just extracts all the names from the expression. That way the cost of parsing the expression twice should be lower.

Daverball avatar Aug 16 '24 20:08 Daverball

Doing the check earlier doesn't work either, because I still need to know about bindings that occur lexicographically later, so it can be distinguished from actually undefined names. In flake8-type-checking I perform two lookups, once in a runtime context and once in a typing context, and only if the typing lookup succeeds while the runtime lookup fails I emit a TCH008.

The current way the AST/semantic checker is structured those two lookups would have to happen at two completely different times. But that makes the logic to determine which references should prevent a TCH008 pretty complicated.

Maybe it makes more sense to do what I did in flake8-type-checking and add a new version of lookup_symbol that expects a ast::NameExpr and performs scope + lexicographical analysis to determine whether or not a symbol actually could be available at runtime when this lookup happens. It will however, as previously mentioned, require some additional bookkeeping for the class bindings that should be unavailable if we start the lookup inside a class scope.

The lookup for AnnAssign bindings will be more expensive too, since we will need to expand the range from the Name to the whole Stmt so it would overlap the range of references inside the value expression for recursive type aliases.

Daverball avatar Aug 17 '24 05:08 Daverball

@charliermarsh It might be worth to move the TCH010 logic to where the TCH008 logic currently sits, so we can actually autofix TCH010 in the same cases where we would otherwise emit a TCH008 by removing the quotes when we know that the quoted expression should be entirely available at runtime.

Concretely we would pass it at the stage where we parse deferred string annotations and check if the current parent expression is a | BinOp. Rather than walk the AST of type definitions to collect the string literals (which doesn't appear to currently properly handle deeply nested cases like eg. list["Foo" | None]).

The drawback would be that it may be more difficult to create an autofix in the opposite case that expands the quotes to the entire type expression, since we would need to walk the parent nodes in order to determine the root node of the type expression.

Daverball avatar Aug 18 '24 09:08 Daverball

Thanks @Daverball, sorry that I haven't had a chance to review this yet.

charliermarsh avatar Sep 03 '24 00:09 charliermarsh

@charliermarsh Just a friendly reminder in case this PR slipped through the cracks somehow

Daverball avatar Oct 09 '24 08:10 Daverball

@charliermarsh Another reminder. Please let me know if these reminders are a bother. The contribution guidelines don't really seem to contain any information about whether it's acceptable/appreciated/frowned upon to bump review requests, it tends to vary from project to project and maintainer to maintainer.

Daverball avatar Nov 05 '24 14:11 Daverball

No that's fine @Daverball -- sorry, it's just blocked on me finding time and clearly I haven't done a good job of doing so yet.

charliermarsh avatar Nov 05 '24 14:11 charliermarsh

@MichaReiser The tests should still be in their own test function, since both rules are enabled in that test function. That's how we made sure there were no circularity issues.

Daverball avatar Nov 27 '24 08:11 Daverball

@MichaReiser Also the fix safety comment is only correct for one of the rules, the other rule is always marked as unsafe but I still don't know why, I'm just relying on the fix safety of the other rule that uses the same fix.

Either way I'm going to force push my local branch and include some of your changes, I already made some of the same changes locally, but I also elaborated on the documentation of simulate_runtime_load.

Daverball avatar Nov 27 '24 08:11 Daverball

Oh sorry. I think it would be good to add a comment to that test explaining why it is a separate tests or it's likely that we'll merge the tests again when we promote the rule to preview.

Could you also extend/correct the fix safety comments because that's the only remaining thing that needs resolving before merging this PR

MichaReiser avatar Nov 27 '24 08:11 MichaReiser

@MichaReiser I think this should cover everything, but feel free to give it another cursory glance.

Daverball avatar Nov 27 '24 08:11 Daverball

Congrats @Daverball on implementing this rather complex rules!

MichaReiser avatar Nov 27 '24 08:11 MichaReiser