rust-analyzer
rust-analyzer copied to clipboard
Get rid of mutable syntax node usage
Somewhat meant as a means of discussion (though ultimately just expressing my opinion):
After having used and reviewed a bunch of PRs using the mutable syntax node API I am personally not too happy about it. It can be awkward to use at times and has some pitfalls that one has to be aware of. A major problem with that API is that it introduces iterator invalidation (you can iterate while mutating the tree), giving way to easily end up panicking if one does not delay the edits after iteration via collecting into a vec or similar. Likewise it makes it harder to replace our version of rowan to one that uses proper slots which is something I am in favor of, as the slot API is unlikely to have the same feature (unless we implement it back into rome's fork of rowan which would be the current slot based rowan solution).
The current benefit of it is that we have the ted api that allows automagically fixing up whitespace on edits, but that's really not necessary in the end game when we have our own syntax formatter which would allow us to format edited nodes on the fly. It is also a rather powerful api in that you can patch up syntax trees on the fly.
The main question then arises, how do we do edits if not via mutable syntax nodes? The current approach was using the handwritten make module to stitch nodes together (which is still somewhat used). But that approach is really annoying to use, inefficient and looks horrid as well. One nice looking approach would be having a quasi quote api as described in https://github.com/rust-lang/rust-analyzer/issues/2227, that could also do some static checks ideally (like checking well typedness of the tree at comp time maybe). With quoting one could just reconstruct the patched tree / construct a the biggest tree encompassing the changes of interested and then ask the assist building api to replace the node pointed to by a specific node ptr with the constructed tree. And alternative to mutation is https://github.com/rust-lang/rust-analyzer/issues/9649 though I haven't looked too much into it.
In general, assists are all over the place currently, some do text based editing, some do make api + stringified text edits and finally we have some that use the mutable node api. Obviously not ideal if we mix and match however we like to.
(Issue spawned of me really not enjoying reviewing assist PRs nowadays anymore, as they tend to be messy in general, though also because I really want us to switch to the trailing trivia model)
cc @matklad feels like an issue you'd have an opinion about or two :)
Yeah, I wouldn’t say that mutable syntax API is a success :-)
That being said, I think it barks at the right tree: the fundamental capability there is “tracking nodes across edits”, and I think it would be important to have that in one way or another. But I am not too confident here: maybe we don’t need cursor API at all?
One curious solution here is Roslyn’s SyntaxAnnotations:
https://joshvarty.com/2015/09/18/learn-roslyn-now-part-13-keeping-track-of-syntax-nodes-with-syntax-annotations/
my understanding is that this mechanism powers all “mutable” editing APIs, which allow you to incrementally mutate the tree.
I looked at how that’s implemented under the hood, and I think it actually is pretty horrible —- there’s a global static weak map of annotated nodes (no link to source, so I might be misremembering)
I think another related issue is how our semantics API can panic if you feed it with a “wrong” Syntax Node. I was thinking that maybe the solution to both problems is biting the bullet and ensuring that every single bit of syntax has a globally unique id?
Definitely need to change something here, though I still am not 100% positive about the ideal end state :P
Another thing I just realized, this API makes it near impossible to map up edits back into macro input token trees
I feel like using both approaches of using SyntaxEditor and using SyntaxAnnotations could work as a replacement to the mutable syntax node approach. I mention SyntaxAnnotations because they could, I dunno, be used to figure out where to place LSP snippets 🙂
Since a SyntaxEditor would store all of the edit actions (insert, replace, delete, etc.) to transform the original tree into a final tree, we retain information on how nodes are transformed. This also means that a SyntaxEditor can propagate SyntaxAnnotations forward to the final tree based on this information, so long as the SyntaxEditor knows what nodes are tracked.
A caveat of letting SyntaxEditor propagate SyntaxAnnotations is that the make constructors will need to inform the SyntaxEditor of where to propagate SyntaxAnnotations to, otherwise we lose track of nodes. Possible approaches of resolving this could be:
- Require passing in a
SyntaxEditorto amakeconstructor (or potential quasi-quoting macro) and then let it add mappings from the old nodes to the new nodes. This would be fine for themakeconstructors, but would be a bit cumbersome for the quasi-quoting macro since you'd need to separate theSyntaxEditorinput from the actual syntax to construct - Have the
makeconstructors/quasi-quoting return aSyntaxResultwhich holds a mapping from old to new nodes and have an explicitinto_syntax(&mut SyntaxEditor)which adds the mappings toSyntaxEditor. - Give up and use a static global concurrent map. It's certainly an option 🙃
One interesting situation to handle is when trying to change both a parent node and its children's nodes, e.g. PathTransform trying to transform
SomeTrait<Item = [u8; SOME_CONSTANT]>;
into
m1::SomeTrait<Item = [u8; m2::SOME_CONSTANT]>;
(based on #17321)
Roslyn's approach is to have a ReplaceNode that takes in a lambda to compute the replacement node. This has a downside of serializing transformation (see https://github.com/dotnet/roslyn/issues/28378), but maybe partitioning the changes list by each computed replacement change may help?
Other idle thoughts:
- Do we want to expose the most recent version of the tree? This would imply applying the transformations to a tree at some point (either after adding any change, or whenever we request for the current tree). Not sure if there's anywhere where we do need to inspect the current version of the tree, and the current semantic model will panic on any newly created nodes (see #17367)
- Do we want to allow combining
SyntaxEditorchanges? Would be useful for e.g.PathTransform, since it could produce aSyntaxEditorwith all of the changes instead of needing to take in the currentSyntaxEditor. - Is tracking all node transformations enough to up-map changes back to the original macro inputs? I feel like this helps somewhat, but I'm not sure of the specifics of what is required for that.
- Moving away from mutable syntax nodes will necessarily affect the
edit_in_placeAPIs, since they assume that they'll refer to the same node. Passing in aSyntaxEditorwill help with creating a new version of the node, but we'll still have a reference to the old node. Maybe just use the quasi-quote macro instead?
Require passing in a SyntaxEditor to a make constructor (or potential quasi-quoting macro) and then let it add mappings from the old nodes to the new nodes. This would be fine for the make constructors, but would be a bit cumbersome for the quasi-quoting macro since you'd need to separate the SyntaxEditor input from the actual syntax to construct
That seems fine to me, make functions could be exposed as methods on a SyntaxEditor I guess which would make it a bit more convenient. As for quasi quoting, sure it'll be a bit more annoying but at the same time it feels bit a like how the quote macro has an extra optional input for re-spanning the tokens. Feels like we can do it the same way, where having no extra input just creates a standalone node as well if that's useful. Overall this kind of API seems a lot nicer still than mutable syntax trees as with those you have to actually fight at times when iterating and mutating at the same time.
Have the make constructors/quasi-quoting return a SyntaxResult which holds a mapping from old to new nodes and have an explicit into_syntax(&mut SyntaxEditor) which adds the mappings to SyntaxEditor.
This sounds a lot worse usability wise than option 1 to me.
Give up and use a static global concurrent map. It's certainly an option 🙃
I don't think anything needs to be said here 😬
Do we want to expose the most recent version of the tree? This would imply applying the transformations to a tree at some point (either after adding any change, or whenever we request for the current tree). Not sure if there's anywhere where we do need to inspect the current version of the tree, and the current semantic model will panic on any newly created nodes (see https://github.com/rust-lang/rust-analyzer/issues/17367)
I feel like we don't? At least I can't tell an immediate use case (we have one place where we kind of do this right now which is in adjustment inlay hints, but that is really just a hack we need to replace generally)
Do we want to allow combining SyntaxEditor changes? Would be useful for e.g. PathTransform, since it could produce a SyntaxEditor with all of the changes instead of needing to take in the current SyntaxEditor.
Feels like something that should be naturally supported.
Is tracking all node transformations enough to up-map changes back to the original macro inputs? I feel like this helps somewhat, but I'm not sure of the specifics of what is required for that.
I'd expect us to only touch real file trees here. That is if we are inside a macro call we ought to first upmap that out and then attempt any edits. Anything else gets hairy for no benefit (in my eyes at least).
That is if we are inside a macro call we ought to first upmap that out and then attempt any edits.
For assists like add_missing_match_arms, it'd be nice if there was a way to edit a synthetic version of the macro input for declarative macros (i.e. the "parsed" macro input) and re-serialize that rather than plain token trees, but that's another issue (do we even have an issue for this? I know it's something we maybe wanted to have).
Ah right, yes that should be in theory possible
I'll try and summarize my understanding of (the mostly agreed upon?) plan to accomplish this:
- Finish @DropDemBits's branch that introduces a
SyntaxEditor.- The remaining work on that branch is to implement
SyntaxAnnotationpropagation, which boils down to applyingSyntaxMapping::upmap_child_element(annotated_element, first_mappable_ancestor, root)to each(SyntaxElement, SyntaxAnnotation). - Bee has informed me over Zulip (please correct me if I'm wrong!) that someone could finish and land their work, as they won't be able to work on this until September.
- The remaining work on that branch is to implement
- After that branch is landed with the
SyntaxMappingmapping changes, all assists should be moved over the newSyntaxEditorAPI. I'm pretty confident that some assists can't be moved over to theSyntaxEditorAPI, soSyntaxEditorwould need grow new APIs as those assists are migrated to the syntax editor API. - Once all assists are on the
SyntaxEditorAPI, mutable syntax trees can be removed from Rowan andSyntaxEditorcan be implemented in terms of immutable syntax trees.
To summarize a meeting with Lukas Wirth, Wilfred Hughes, David Richey, and me (notes here), removing mutable syntax trees from Rowan would allow the removal of doubly-linked lists in Rowan in favor of a more contiguous representation (e.g., a vector). I've determined through my (relatively unscientific!) benchmarking that this could make Rowan twice as fast.
Also, this whole refactor would make for a great post for the blog, if anyone is keen to write one!
Confirming that I'm okay with handing off the work to someone else, excited to see it land regardless
I'll try the SyntaxEditor thing unless someone else is working on it
@ShoyuVanilla Have you had the chance to pick this up yet? I'm available to work on this again now that I've got time to do so.
Not yet. And I think that it would be better if you could take this because you know this better than me 😅