aerial.nvim icon indicating copy to clipboard operation
aerial.nvim copied to clipboard

Treesitter HTML support, nested languages

Open mikaraunio opened this issue 3 years ago • 4 comments

As I'm not quite happy with the HTML outline provided by LSP, I thought I'd have a try at writing some Treesitter queries to get basic support into Aerial. Think I have something workable, and will submit a PR.

Working on this, however, got me thinking about support for nested languages. It would be great if Aerial could support JS and CSS in HTML (and Lua in Vimscript by the same token), since Treesitter already provides the data. Any plans for this?

mikaraunio avatar Jan 25 '23 04:01 mikaraunio

I don't currently have any plans to support nested languages. It would be a little tricky because of how the treesitter language extensions work, but it's definitely possible.

stevearc avatar Jan 28 '23 23:01 stevearc

I've been toying with this, and very basic support ended up being fairly trivial. After grabbing parser for buffer with helpers.get_parser, instead of calling :parse(), I wrapped the entire query processing code in:

buf_parser:for_each_tree(function(syntax_tree, parser)
...
end)

Proper nesting is the hard problem, as that involves cross-linking matches in different trees with theoretically infinite nesting. However, if it's sufficient to have a separate tree for each injection - then that's pretty much it on the parsing side.

Additionally, instead of for_each_tree, an on_changedtree callback via a recursive register_cbs could be used. It avoids the need to register autocommands, and should react to tree updates due to late injections parsing, for example.

Slotos avatar Oct 24 '23 11:10 Slotos

I learned in working with conform.nvim that there are some injections that are "combined", meaning they merge all of their regions and therefore it is very difficult, maybe impossible, to determine where they are actually nested in the document.

A simple example is the comments in a markdown file:

text

<!-- comment -->

text

<!-- comment -->

If you open that in neovim and look at the tree in :InspectTree, you'll see that both of the comments appear at the top. I had to do some ugly work and access private APIs to work around this in conform.nvim https://github.com/stevearc/conform.nvim/blob/3fc2c956d99216b2816f07d2b946020ba2e02457/lua/conform/formatters/injected.lua#L107-L119

But if one were to use that same logic, it might be possible to create a nested list of symbols from the nested LanguageTrees.

stevearc avatar Dec 01 '23 07:12 stevearc

I believe it can be simplified. For every symbol we know a starting position. By going up the LanguageTrees tree, smallest parent match contaning the starting position can be found, determining the placement of the match.

Problem is, this matching is expensive if performed naively, no matter which way we walk the parsers tree. I'm mulling implementing an interval tree just for this, but it will have to be well optimised - i.e. no linked lists etc. - in order to be fast on smaller trees.

To side-step the combined tree problem, it's possible to collect reports from LanguageTrees, group them by language, and show them one after another. Since the trees are traversed from top to bottom, if the grouping is filled as new parsers are encountered, the order of trees in the aerial view will make sense most of the time. Who knows, this might end up being the preferred view for most.

PS: If I say "tree" one more time...

Slotos avatar Dec 01 '23 08:12 Slotos