rust-analyzer icon indicating copy to clipboard operation
rust-analyzer copied to clipboard

Just use trees for HIR?

Open matklad opened this issue 3 years ago • 19 comments

EDIT: this now has a design doc: https://hackmd.io/@matklad/rJzQhvk2u/edit

One of our early design decisions was to keep all semantic info out of the syntax trees. I think that worked well for the compiler side of rust-analyzer -- we managed to do a bunch of relatively clear IRs, which are relatively incremental.

However, I am not so convinced that this split works quite as well for the hir API. When implementing, eg, assists, working with syntax is easy. Patterns like node.ancestors().find_map(ast::Expr::cast) are emerging nice idioms. Going from syntax to semantics and back creates a lot of friction, both because there are two layers to start with, and because the bridges between them are unclear.

So I am thinking, what if hir exported API which looked exactly as our syntax-tree based API, but with additional semantic info? So that, you can ask .type for an expression, .resolve for path, etc.

When you ask for .parent of a module/file, you transparently get the module in another file.

For macro calls, there are two children: .arg() and .expansion(), and .expansion() goes into another file.

Similarly, .parent transparently goes via expansion stack.

matklad avatar May 03 '21 17:05 matklad

Note: got this idea after reading "early on, we decided that syntax trees would be the basic API of Roslyn" in some blog. My first reaction was "of course we can't do that, because macros", but then I realized that a macro can have both expansion and argument as children...

matklad avatar May 03 '21 17:05 matklad

So, what would that concretely mean compared to the current API? I think 1. some types that currently are just data would gain an identity / location in the tree (e.g. so you can go back "up" from a Name), 2. most types gain a parent method if they don't have one, and some methods change to be more analogous to the syntax trees 3. we need to fill the current "holes" in the HIR (e.g. expressions)? :thinking: Or are you considering a more fundamental change to how the HIR types are represented? (Would the HIR methods still always take a database argument?)

I think this does sound like a good idea, and would probably give us some more guidance as to how to design the HIR APIs...

flodiebold avatar May 03 '21 17:05 flodiebold

Would there be a HirNode analogous to SyntaxNode, so you can do some_expression_hir.ancestors().find_map(hir::Function::cast)? I wonder whether the HIR needs that flexibility, or whether there could instead be something like some_expression_hir.ancestor::<hir::Function>().

flodiebold avatar May 03 '21 17:05 flodiebold

Yeah, I think that's roughly what I have in mind. I'd add the following as well:

  • we get a hir::Node untyped type, analogous to SyntaxNode, so that we can have .ancestrors() method (your version is also good though!)

  • macro expansion API is changed to be transparent (hir::MacroCall::expansion/argument)

  • we somehow fill-in syntactic gaps (hir::Struct::struct_keyword needs to be a thing probably, hir::get_token_at_offest is also needed)

  • (maybe) for syntax trees, we had quite a success with handles to nodes. When I think about hir::Struct as a handle, I think that maybe it's OK to embed a pointer to a database inside this handle?

matklad avatar May 03 '21 17:05 matklad

Probably makes sense to take a second look at https://docs.microsoft.com/en-us/dotnet/api/microsoft.codeanalysis.semanticmodel?view=roslyn-dotnet-3.9.0, it feels like it has a lot of goodness

IsSpeculativeSemanticModel | Returns true if this is a speculative semantic model created with any of the TryGetSpeculativeSemanticModel methods

matklad avatar May 06 '21 22:05 matklad

(Perpetual) Question: what should be the underlying storage strategy for such a hir?

I can see three different strategies:

// I. Copy type with reference to context
struct Store { ... }

#[derive(Clone, Copy)]
pub struct IfExpression<'a> {
    id: u32,
    store: &'a Store
}

impl IfExpression<'a> {
    pub fn cond(self) -> Expression<'a> { ... }
}

// II. Copy type with explicitly passed-in context
pub struct Store { ... }

#[derive(Clone, Copy)]
pub struct IfExpression {
    id: u32,
}

impl IfExpression {
    pub fn cond(self, store: &Store) -> Expression { ... }
}

// III. !Copy type with shared reference to the context
#[derive(Clone)]
pub struct IfExpression {
    id: u32,
    store: Rc<Store>
}

impl IfExpression {
    pub fn cond(self) -> Expression { ... }
}

matklad avatar Jun 14 '21 11:06 matklad

I feel like option 2 could get annoying when you need to pass the store as well as the database to some function calls.

Veykril avatar Jun 14 '21 11:06 Veykril

I think I would go with option 2 for now, like with the current HIR. I think the store can and maybe has to contain a reference to the DB as well?

flodiebold avatar Jun 14 '21 11:06 flodiebold

Yeah, the store would have to have a DB. How exactly it stores the db is also a question -- hir is polymorphic in the DB, and the caller can use a db with additional facts. I tried to account for that by making Semantics parametric over the database, but I don't really like this design -- seems to complicated, and breaks separate compilation somewhat. Seems better to keep just dyn HirDatabase in the store, and let the caller pass-in a better-typed reference themselves.

It also seems that the Store should be bigger than the database -- I think we should allow for some store-local caches, like we do in semantics. So, it'll probably be store: &'a Store<'a>.

matklad avatar Jun 14 '21 12:06 matklad

Observation: III is basically how our syntax tree is setup actually.

matklad avatar Jun 14 '21 22:06 matklad

Yeah. On the other hand, you don't usually need the DB to manipulate syntax trees, but you will need it to do things with the HIR. And I don't think we can put a DB reference into an Rc<Store> :thinking: So this would end up being option 2 as well anyway...

flodiebold avatar Jun 15 '21 08:06 flodiebold

You don't need DB literary, but a reference to a single SyntaxNode does give access to the whole. The role of DB is played by the root node, which owns the underlying data (green node). I don't think we can put a reference to db, but, presumably, we can store Rc<Snapshot<DB>> or some such.

But it's a good point that, fundamentally, we have a lifetime here -- everything happens in a context of a running query. Even if we do fully owned API, that would be a footgun. If you store a hir node in some long-living datastructure, you'll notice that you can't actually apply modifications to the database, because the stored node will keep a database reference alive.

This I think rules out the third variant. Which is good -- it's the most tantalizing one!

matklad avatar Jun 15 '21 08:06 matklad

Yeah. Of the remaining ones, option 1 is more inconvenient than 2, but it makes the connection of the nodes to the store clearer, and maybe that's a good thing :thinking:

flodiebold avatar Jun 15 '21 08:06 flodiebold

The extra reference in option 1 is quite significant wrt. the memory usage, right?

lnicola avatar Jun 15 '21 17:06 lnicola

Not sure, since we won't keep many of these around usually, I think?

flodiebold avatar Jun 16 '21 15:06 flodiebold

Yeah, due to the lifetime, neither setup will be stored in long-term storage. Although optimizing transient memory usage and cache efficiency will be nice!

Couple of more considerations:

  • Option 2 can't implement useful Debug/Display directly on the hir nodes, which is inconvenient. (again, the fact that {} works for trees is a godsend when it comes to figuring out why a certain assist does not work).

  • At some point, we will expose this API for an unbounded set of people to use. If I think in that context, the "pass the store in explicitly" starts sounding more weird in comparison to only us using the API.

matklad avatar Jun 20 '21 16:06 matklad

Option 2 can't implement useful Debug/Display directly on the hir nodes, which is inconvenient. (again, the fact that {} works for trees is a godsend when it comes to figuring out why a certain assist does not work).

You could have a .display() function that takes the store and returns a value that implements Display.

bjorn3 avatar Jun 20 '21 16:06 bjorn3

I am warming up to the idea more and more. At the same time, I feel a bit scared -- it looks like this might be the last time we have a chance to re-design the whole API. I feel a bit of waterfall can be helpful here! So, I suggest using this document to collaboratively iterate on the design:

https://hackmd.io/@matklad/rJzQhvk2u/edit

It's very bare bones, I plan to fill it in in the coming weeks, but I want to lay the design framework here upfront!

matklad avatar Jun 22 '21 14:06 matklad

Hello, I'm currently working on marker (formally rust-linting) where I try to create a stable driver independent linting interface for Rust. The IR will be based on the Rust Reference, to ensure that it's (as much as possible) driver independent.

From reading this discussion and the linked document, it looks the goal of the these project mostly align. The project will mostly focus on usability and less on performance, which was mentioned in the linked document. However, based on the input, I'll try to make a few more things lazy evaluated and leave the cashing/node storage up to the driver.

Right now, the project is in a very early stage and the current test representation has to be rewritten for various reasons. So, there is probably very little/nothing that can be done right now to help. I mainly wanted to get this idea out there. :upside_down_face: Though, @HKalbasi has expressed interest to test rust-analyzer as a driver.

xFrednet avatar Aug 09 '22 12:08 xFrednet