ruff
ruff copied to clipboard
Restore Salsa DB for exploring Salsa further
ruff-ecosystem results
Linter (stable)
✅ ecosystem check detected no linter changes.
Linter (preview)
✅ ecosystem check detected no linter changes.
Formatter (stable)
✅ ecosystem check detected no format changes.
Formatter (preview)
✅ ecosystem check detected no format changes.
CodSpeed Performance Report
Merging #11338 will not alter performance
Comparing red-knot-salsa (06ee178) with main (2e0a975)
Summary
✅ 30 untouched benchmarks
There's one limitation with the current model where the invalidation isn't as good as it could be and it is due to the fact that we build the entire symbol table at once (we don't have to and we could refactor that later).
Let's say we start with
# main.py
import foo;
x = foo.x
# foo.py
x = 10
def foo():
pass
And we infer the type of x in main. To do this, the implementation runs
- It parses
main, builds its symbol table and cfg, calls intoinfer_module_body - It resolves
foowhen reachingimport foo - It calls
ModuleType.memberwhen reachingfoo.x. This fans out to parsefoo, build its symbol table and control flow graph, and then runs module-level type inference forfoo - ...
When we now change the content of foo to
x = 10
def foo():
y = 10
What I expected is that the type inference for main wouldn't re-run because the module-level types of foo remain unchanged. However, that's not the case. The reason is that foo.x resolved the symbol table of foo.py and the symbol table has changed because we introduced y in the scope of foo. We have the same problem when a flag of an enclosing symbol changes. For example if the body of foo.foo is changed to y = x. The symbol table of that module changes because x now has the flag used.
We can avoid this by also building the symbol table per scope rather than once globally. Or have a query that reduces the global symbol table to just the global symbols. I do think something like that would be nice to have more fine granular invalidations.
This is a great write-up, thanks for taking the time! A few thoughts from the write-up, before I dive into the code:
- One caveat with building symbol tables per-scope is that the contents of nested functions can affect the symbol table of the enclosing function. (If a nested function uses
nonlocal xand assigns tox, the symbol kind forxmay change based on the knowledge that a nested function might assign to it.) So we can build symbol tables separately for module scope, class scope, and function scope -- but building a function symbol table needs to build all enclosed symbol tables (which may involve N nested class and function scopes, to arbitrary depth.) I think in practice this is still fine, though, and worth doing the split. Nesting isn't that common, and by definition the symbols inside a function scope aren't ones that another module can depend on, anyway. - My initial feeling is that what you are calling
Modulemaybe should be calledModuleName, and what you are callingResolvedModulemaybe should be calledModule. But I'm not totally sure until I look closer at the code. - I think if we are doing type inference per-scope rather than per-expression that will probably also recommend changes to how type inference works in the first place. We can probably just walk the AST for the scope, assigning types to expressions as we go, and similarly tracking narrowed local types for local symbols as we go. This will mean we don't even need
FlowGraphanymore (a lot of the concepts developed in building it will still carry over, it will just be resolved eagerly instead.)
One caveat with building symbol tables per-scope is that the contents of nested functions can affect the symbol table of the enclosing function.
The part that's unclear to me is how we would compute flags like USED when building the symbol table lazily per scope, because a symbol might be used in a child scope. But we also have the option to build the entire symbol table at once and then split it per scope (similar to the trick with SymbolIndex, build it once, then have sub-queries that only return a slice of that.
My initial feeling is that what you are calling Module maybe should be called ModuleName, and what you are calling ResolvedModule maybe should be called Module. But I'm not totally sure until I look closer at the code.
Yeah, but that would require that ModuleName becomes a sala ingredient. It might be fine but it makes ModuleName a bit more awkward to use. But if we keep ModuleName a regular struct, than what would you call the module thing that we pass to resolve_module? I might be overthinking this because we can make the argument to resolve_module private and only expose a resolve_module(db: &dyn Db, name: &str) function that internally converts the &str to that Module thing and we can then keep our existing terminology. Maybe that's for the best (it also hides complexity).
My thinking why I called the Module { name: ModuleName } a module is that I don't think that the existence of the file on the disk makes a module. When we have import foo, then there's an import of the foo module, regardless if that module exists or not. That's why I think that a module's identity is really defined by its name.
We can probably just walk the AST for the scope, assigning types to expressions as we go, and similarly tracking narrowed local types for local symbols as we go. This will mean we don't even need FlowGraph anymore (a lot of the concepts developed in building it will still carry over, it will just be resolved eagerly instead.)
This is what the new implementation does. But I must say, it would make me sad to see your CFG go away. I think it could be useful for other things than just typing, like an unreachable rule.
The part that's unclear to me is how we would compute flags like USED when building the symbol table lazily per scope, because a symbol might be used in a child scope.
The USED flag only means "used in current scope", so it's not an issue. The only analysis that crosses scopes is the one Patrick was working on, and that's the case I discussed in my comment; but it only applies to nested scopes within function scopes, so handling a function scope and all it's nested scopes together is one way to handle it; module scopes and class scopes not nested in functions can be fully independent.
build it once, then have sub-queries that only return a slice of that.
This could work, too.
When we have import foo, then there's an import of the foo module, regardless if that module exists or not. That's why I think that a module's identity is really defined by its name.
This is a good point. I think it's a solid enough reason for the current naming. We will need to be able to track dependencies on nonexistent modules.
This is what the new implementation does.
"tracking narrowed local types for local symbols as we go" is the part that would specifically replace the CFG; I don't think the implementation here does that yet.
it would make me sad to see your CFG go away. I think it could be useful for other things than just typing, like an unreachable rule.
The eager version of the same logic would also have the ability to discover unreachable branches.
Module Resolver
The module resolver remains mostly unchanged, although I did some renaming.
ModuleMy initial feeling is that what you are calling Module maybe should be called ModuleName, and what you are calling ResolvedModule maybe should be called Module. But I'm not totally sure until I look closer at the code.
Yeah, but that would require that
ModuleNamebecomes a sala ingredient. It might be fine but it makesModuleNamea bit more awkward to use. But if we keepModuleNamea regular struct, than what would you call the module thing that we pass toresolve_module? I might be overthinking this because we can make the argument toresolve_moduleprivate and only expose aresolve_module(db: &dyn Db, name: &str)function that internally converts the&strto thatModulething and we can then keep our existing terminology. Maybe that's for the best (it also hides complexity).
I wonder if what's currently called Module could be renamed to ModuleRequest. The user "requests" a module (and the request is represented with a ModuleRequest instance) by importing a module with a certain ModuleName, but they might not actually get a module back, because the module might not actually exist. Unlike the Module type on the main branch, your Module type in crates/red_knot/src/salsa_db/semantic/module.rs doesn't really feel like a Module to me, as you can't query any information about the module directly from the type -- you have to resolve it first, and it feels like the module object is the thing you get given at the end of the resolution process.
Thanks @AlexWaygood for the feedback. I don't plan to incorporate any of the code changes into this PR because I don't plan on merging. I'll incorporate your changes when working on the specific areas before pulling them into ruff.