rust-analyzer
rust-analyzer copied to clipboard
Just use trees for HIR?
EDIT: this now has a design doc: https://hackmd.io/@matklad/rJzQhvk2u/edit
One of our early design decisions was to keep all semantic info out of the syntax trees. I think that worked well for the compiler side of rust-analyzer -- we managed to do a bunch of relatively clear IRs, which are relatively incremental.
However, I am not so convinced that this split works quite as well for the hir API. When implementing, eg, assists, working with syntax is easy. Patterns like node.ancestors().find_map(ast::Expr::cast)
are emerging nice idioms. Going from syntax to semantics and back creates a lot of friction, both because there are two layers to start with, and because the bridges between them are unclear.
So I am thinking, what if hir exported API which looked exactly as our syntax-tree based API, but with additional semantic info? So that, you can ask .type
for an expression, .resolve
for path, etc.
When you ask for .parent
of a module/file, you transparently get the module in another file.
For macro calls, there are two children: .arg()
and .expansion()
, and .expansion()
goes into another file.
Similarly, .parent
transparently goes via expansion stack.
Note: got this idea after reading "early on, we decided that syntax trees would be the basic API of Roslyn" in some blog. My first reaction was "of course we can't do that, because macros", but then I realized that a macro can have both expansion and argument as children...
So, what would that concretely mean compared to the current API? I think 1. some types that currently are just data would gain an identity / location in the tree (e.g. so you can go back "up" from a Name
), 2. most types gain a parent
method if they don't have one, and some methods change to be more analogous to the syntax trees 3. we need to fill the current "holes" in the HIR (e.g. expressions)? :thinking: Or are you considering a more fundamental change to how the HIR types are represented? (Would the HIR methods still always take a database argument?)
I think this does sound like a good idea, and would probably give us some more guidance as to how to design the HIR APIs...
Would there be a HirNode
analogous to SyntaxNode
, so you can do some_expression_hir.ancestors().find_map(hir::Function::cast)
? I wonder whether the HIR needs that flexibility, or whether there could instead be something like some_expression_hir.ancestor::<hir::Function>()
.
Yeah, I think that's roughly what I have in mind. I'd add the following as well:
-
we get a
hir::Node
untyped type, analogous toSyntaxNode
, so that we can have.ancestrors()
method (your version is also good though!) -
macro expansion API is changed to be transparent (
hir::MacroCall::expansion/argument
) -
we somehow fill-in syntactic gaps (hir::Struct::struct_keyword needs to be a thing probably,
hir::get_token_at_offest
is also needed) -
(maybe) for syntax trees, we had quite a success with handles to nodes. When I think about
hir::Struct
as a handle, I think that maybe it's OK to embed a pointer to a database inside this handle?
Probably makes sense to take a second look at https://docs.microsoft.com/en-us/dotnet/api/microsoft.codeanalysis.semanticmodel?view=roslyn-dotnet-3.9.0, it feels like it has a lot of goodness
IsSpeculativeSemanticModel | Returns true if this is a speculative semantic model created with any of the TryGetSpeculativeSemanticModel methods
(Perpetual) Question: what should be the underlying storage strategy for such a hir?
I can see three different strategies:
// I. Copy type with reference to context
struct Store { ... }
#[derive(Clone, Copy)]
pub struct IfExpression<'a> {
id: u32,
store: &'a Store
}
impl IfExpression<'a> {
pub fn cond(self) -> Expression<'a> { ... }
}
// II. Copy type with explicitly passed-in context
pub struct Store { ... }
#[derive(Clone, Copy)]
pub struct IfExpression {
id: u32,
}
impl IfExpression {
pub fn cond(self, store: &Store) -> Expression { ... }
}
// III. !Copy type with shared reference to the context
#[derive(Clone)]
pub struct IfExpression {
id: u32,
store: Rc<Store>
}
impl IfExpression {
pub fn cond(self) -> Expression { ... }
}
I feel like option 2 could get annoying when you need to pass the store as well as the database to some function calls.
I think I would go with option 2 for now, like with the current HIR. I think the store can and maybe has to contain a reference to the DB as well?
Yeah, the store would have to have a DB
. How exactly it stores the db is also a question -- hir is polymorphic in the DB, and the caller can use a db with additional facts. I tried to account for that by making Semantics
parametric over the database, but I don't really like this design -- seems to complicated, and breaks separate compilation somewhat. Seems better to keep just dyn HirDatabase
in the store, and let the caller pass-in a better-typed reference themselves.
It also seems that the Store
should be bigger than the database -- I think we should allow for some store-local caches, like we do in semantics. So, it'll probably be store: &'a Store<'a>
.
Observation: III is basically how our syntax tree is setup actually.
Yeah. On the other hand, you don't usually need the DB to manipulate syntax trees, but you will need it to do things with the HIR. And I don't think we can put a DB reference into an Rc<Store>
:thinking: So this would end up being option 2 as well anyway...
You don't need DB literary, but a reference to a single SyntaxNode does give access to the whole. The role of DB is played by the root node, which owns the underlying data (green node). I don't think we can put a reference to db, but, presumably, we can store Rc<Snapshot<DB>>
or some such.
But it's a good point that, fundamentally, we have a lifetime here -- everything happens in a context of a running query. Even if we do fully owned API, that would be a footgun. If you store a hir node in some long-living datastructure, you'll notice that you can't actually apply modifications to the database, because the stored node will keep a database reference alive.
This I think rules out the third variant. Which is good -- it's the most tantalizing one!
Yeah. Of the remaining ones, option 1 is more inconvenient than 2, but it makes the connection of the nodes to the store clearer, and maybe that's a good thing :thinking:
The extra reference in option 1 is quite significant wrt. the memory usage, right?
Not sure, since we won't keep many of these around usually, I think?
Yeah, due to the lifetime, neither setup will be stored in long-term storage. Although optimizing transient memory usage and cache efficiency will be nice!
Couple of more considerations:
-
Option 2 can't implement useful
Debug/Display
directly on the hir nodes, which is inconvenient. (again, the fact that{}
works for trees is a godsend when it comes to figuring out why a certain assist does not work). -
At some point, we will expose this API for an unbounded set of people to use. If I think in that context, the "pass the store in explicitly" starts sounding more weird in comparison to only us using the API.
Option 2 can't implement useful Debug/Display directly on the hir nodes, which is inconvenient. (again, the fact that {} works for trees is a godsend when it comes to figuring out why a certain assist does not work).
You could have a .display()
function that takes the store and returns a value that implements Display
.
I am warming up to the idea more and more. At the same time, I feel a bit scared -- it looks like this might be the last time we have a chance to re-design the whole API. I feel a bit of waterfall can be helpful here! So, I suggest using this document to collaboratively iterate on the design:
https://hackmd.io/@matklad/rJzQhvk2u/edit
It's very bare bones, I plan to fill it in in the coming weeks, but I want to lay the design framework here upfront!
Hello, I'm currently working on marker (formally rust-linting) where I try to create a stable driver independent linting interface for Rust. The IR will be based on the Rust Reference, to ensure that it's (as much as possible) driver independent.
From reading this discussion and the linked document, it looks the goal of the these project mostly align. The project will mostly focus on usability and less on performance, which was mentioned in the linked document. However, based on the input, I'll try to make a few more things lazy evaluated and leave the cashing/node storage up to the driver.
Right now, the project is in a very early stage and the current test representation has to be rewritten for various reasons. So, there is probably very little/nothing that can be done right now to help. I mainly wanted to get this idea out there. :upside_down_face: Though, @HKalbasi has expressed interest to test rust-analyzer as a driver.