PLR0401 Cyclic import implementation
Hello fellow ruffians(?)
I was thinking of having a stab at implement pylint's R0401 cyclic import rule, but before I set off I thought it might be wise to ask your opinions and to clarify my current understanding of how ruff works.
Would I be correct to say that, all the python files to be linted are collected, then each one is linted individually in parallel?
It seems the expected output from lint_path is a Result<Diagnostics>. If I wanted to track imports, I noticed there was an imports.rs module, but to the best of my knowledge this is called via linter::check_path which is also only run per file?
My initial thoughts were perhaps to change the APIs somewhat to return a Vec of the imports, create a graph and then detect cycles.
👍 Related to the question that was just asked here: https://github.com/charliermarsh/ruff/issues/970#issuecomment-1432608303.
Everything you've said above is correct. Right now, we use a single-file model, so every file is analyzed independently. We do some work upfront to determine the package hierarchy (e.g., to determine the package root for every file), which is the only part of the analysis pipeline that depends on "multiple files", so-to-speak.
(And yeah -- imports.rs is for enforcing rules like "Are the imports in this file sorted?". That logic does depend on being able to determine which imports are first-party, but that is done by looking at the contents of the package root as described above, so it doesn't depend on any deeper analysis than "If we import foo, does a foo directory or foo.py file exist in src?)
I think you're right that we want the linter pass to return some kind of "dependency graph" for each module, which should include the members that the module "exports" (implicitly, or explicitly via __all__), and the members that the module depends on. If we had that information, it would unlock tons of rules -- cyclic imports (although this relies on detecting whether the import is at the top-level, I assume, which requires tracking that metadata too), but also, "member not defined in module", etc. (although for third-party modules, that would get trickier).
We could start with a subset of that behavior, though, e.g., just the modules that a given module depends on, as you've described above. I'd be open to reviewing a PR or proposal to that effect!
Thanks for the feedback, sounds good! I've not been as free as I'd like but I'll come up with something on the weekend.
Sorry I took so long to getting round to having a stab at this. Do you think it would make sense for me just to return the Located<U, T> Import and ImportFrom objects or should I be making some kind of custom type to house the full node path and its location?
How does the cyclic import rule work? Does it just detect cyclic imports at the top-level? I.e., are delayed or deferred imports (in function bodies) ignored?
I think either approach (returning AST nodes, or making a custom type) could work. We'll probably convert to some kind of internal representation when we do the cycle detection either way.
I can confirm that pylint doesn't detect cyclic imports in delayed/deferred imports and only in top-level imports.
It also doesn't attribute the cyclic import to the correct module neither. I made something which looked like this:
src
- __init__.py
- e
- __init__.py
- a.py (import src.f.a)
- f
- __init__.py
- a.py (from src.g import a as b)
- g
- __init__.py
- a.py (import src.e.a)
There are also the modules a-d too that I made with cyclic imports but what pylint throws out is:
************* Module src.c.__init__
src/c/__init__.py:1:0: R0401: Cyclic import (src.a -> src.b) (cyclic-import)
src/c/__init__.py:1:0: R0401: Cyclic import (src.e.a -> src.f.a -> src.g.a) (cyclic-import)
src/c/__init__.py:1:0: R0401: Cyclic import (src.c.a -> src.d.a) (cyclic-import)
Which I feel like is something we'd be able to avoid.
Given a go at at least propagating the information on all the imports every module has in #3243
Given Ruff's sheer speed, it has a good shot at implementing cyclic import detection even better than pylint and pyright !
pylint's cyclic-import and duplicated-code reports are inconsistant: https://github.com/PyCQA/pylint/issues/374
pyright still reports cyclic imports when said import is behind a TYPE_CHECKING check (aka a type-only import). https://github.com/microsoft/pyright/issues/3489#issuecomment-1133496121
Oh interesting! Yeah we do track that info, we should be able to differentiate between typing-only imports when detecting cycles :)
Any update on this? It's one of the main things I'd love to have given that my tests sometimes fail on circular imports and having the fast linter spot these before running the tests would be optimal. Thanks!
There's an attempt here #3880 Main issue I think is that it's quite different to the other rules as it needs information from other modules that are linted, which doesn't really vibe with how Ruff generally works.
Lovely! Good luck with that then. I wish I could help but unfortunately my Rust knowledge is non-existent :c
@charliermarsh CPython 3.12 now has low-overhead dynamic tracing, which could be used to trace imports and e.g., find cycles. https://docs.python.org/3.12/whatsnew/3.12.html#pep-669-low-impact-monitoring-for-cpython