neovim
neovim copied to clipboard
feat(treesitter): handle quantified nodes for injection ranges
Allows injection ranges to span quantified node captures. I had made this PR mostly for the sake of convenience but I have also noticed it's made injections more robust in certain ways:
- Empty injection ranges are ignored (see the modification in
parser_spec; previously, the ranges were not counted before because they were empty, so I had to change the offset amount) - Most use cases for
injection.combinedare now obsolete- The only cases where it would still be necessary (that I can think of) are when you really need a combined injection range and you cannot capture both nodes for that combination in just a single query.
- Combined injection ranges span the in-between area more accurately. See the below images:
Before:
After:
Question: Was the previous behavior intended? Perhaps there are cases where parsing the content in-between the combined range (rather than just combining the nodes themselves and parsing only those) leads to undesirable highlighting? I couldn't find an answer to that question from the documentation, but it is very specific. Perhaps a reviewer wiser than myself can answer it
I will add tests later, if this PR is deemed acceptable
Ok, some more thoughts. Currently, injection.combined is pretty buggy. If I wanted to have the following type of injection:
(arguments
(string
(string_content) @injection.content)
(#set! injection.combined)
(#set! injection.language "c"))
the entire injection has the same range, leading to highlights bleeding into other areas, which is undesirable. See the following image:
However, as this PR stands now, there is no easy way to achieve injection ranges for the above. #27504 would help, but it would still only be giving an ugly way to solve this problem (we would have to calculate an offset for everything, and this can be bad because who knows if, e.g. the strings above will or will not have a trailing comma?). In my opinion, the proper way forward would be to use quantified captures in place of what injection.combined used to be needed for, and keep injection.combined for special cases.
Special cases would apply to, e.g., the html_tag injection shown in the PR description. In the lua arguments example, we don't want to inject the range in-between each node, since that would mean the string delimiters and trailing commas also get injected. However, we do want the text in-between to be part of the injection for the HTML tags in Markdown, because then we get a nice highlight for the text inside the tags (which doesn't exist now).
TLDR: I would like to propose a new meaning for injection.combined. For
other NODETYPE other
other NODETYPE other
The query
(NODETYPE)+ @injection.content
should inject:
other ******** other
other ******** other
and
((NODETYPE)+ @injection.content (#set! injection.combined))
should inject:
other **************
************** other
@clason @justinmk @lewis6991 does this sound like an acceptable change?
Question: Was the previous behavior intended? Perhaps there are cases where parsing the content in-between the combined range (rather than just combining the nodes themselves and parsing only those) leads to undesirable highlighting? I couldn't find an answer to that question from the documentation, but it is very specific. Perhaps a reviewer wiser than myself can answer it
It does lead to undesirable highlighting, but not from neovim core's side. nvim-treesitter had issues filed before related to php highlighting getting overridden by html.
I think we should first ask upstream to decide, document, and implement the official behavior of injections for quantified nodes (like we did for #any-of? and friends). This ensures that the behavior is standardized across editors so we can share injection queries.
Relevant ideas from Helix here: https://github.com/helix-editor/helix/discussions/11157; that's a bit orthogonal to how quantified nodes should be handled.
Can we -- as a first step -- just add handling of quantified nodes and leave any changes to @injection.combined to a followup after the upstream discussion is concluded?
Sure, but how should the behavior be handled? Should consecutive nodes have their injection ranges merged (e.g. text between two tags)? Or separate?
I think as you propose them to be handled. (Basically, quantified injections should be the "better injection.combined", as you say.)
We don't need to handle all situations out of the box, it's enough to handle some (most common) situations correctly, and better than before. It's ok to leave other, more contentious, situation unhandled and unspecified (UB).
Cool I can do that. One last thing to mention: if I do that and if the injection.combined changes ever get merged, it would require another change to the quantified captures to go from a combined range to a non-combined range (just the sum of the individual node ranges). Since the range combination would be handled by the predicate. I don't know if that change would qualify as breaking though, or if it is it's at least low impact hopefully. Still ok to go through with this PR?
I think there's two different issues here:
- global scope combination: all injected nodes for a language get treated as one single document -- this is what
@injection.combineddoes. - local scope combination: all captured nodes in a pattern get treated as a combined node -- this doesn't exist yet and is what this proposal is about.
Once you have both, you have to decide on how the combination behaves. Easy answer for now: not at all; just don't use it yet. Less easy (but most compatible) answer: Same as before: all captured nodes get added to the "global" context, whether they are quantified or not. This would already improve the situation (and not be breaking).
Your "extended range" version is a third option, but while I see how that would be useful, I would like to keep it out of the scope for now until the upstream discussion has concluded. This would punt any breaking changes to a single future PR.
Does that make sense?
Just to be on clear side, here's what I suggest for this PR:
Assume you have
other NODETYPE other
other NODETYPE other
something other entirely
other NODETYPE other
other NODETYPE other
Then (numbers denote distinct contexts for parsing)
(NODETYPE) @injection.contentshould capture
other 11111111 other
other 22222222 other
something other entirely
other 33333333 other
other 44444444 other
(NODETYPE)+ @injection.contentshould capture
other 11111111 other
other 11111111 other
something other entirely
other 22222222 other
other 22222222 other
((NODETYPE)+ @injection.content)(#set! injection.combined)should capture
other 11111111 other
other 11111111 other
something other entirely
other 11111111 other
other 11111111 other
(which is the same as ((NODETYPE) @injection.content)(#set! injection.combined)).
(Assuming quantified nodes actually capture non-adjacent matches, which I'm not 100% sure about.)
And any behavior that involves extending ranges beyond the nodes itself should be left to a dedicated directive (TBD). Basically, this PR should only be about adding option 2, which so far has been missing but would cover (correctly) ~90% of the current (often incorrect) uses of injection.combined. This would then not be a breaking change.
Sorry for the back and forth, but I am fairly confident that this proposal (which I believe matches yours from https://github.com/neovim/neovim/pull/29601#issuecomment-2212737235) is what matches upstream intention and will address most current issues (as you write, the current global injection.combined does not work properly in a number of situations, e.g., https://github.com/nvim-treesitter/nvim-treesitter/issues/6530 or https://github.com/nvim-treesitter/nvim-treesitter/issues/7013).
It will not cover embedded HTML tags, but those are another kettle of fish entirely and probably require a bespoke solution. (In your first example, what would happen if one of the tags got deleted for some reason?)
Sounds good, I'll get on it. As for the later point, at that point maybe the bespoke solution would render my initial injection.combined proposal unnecessary. Gonna postpone giving it more thought for now
Ok, I think this will require more effort. Looks like there is no notion of a global or local injection scope right now, injection.combined just makes it so that the ranges are themselves all combined (which incidentally parses it as the same document, and also leads to the highlighting artifacts). Some scope implementation will have to be added for this to work
which incidentally parses it as the same document
Not incidentally; that's how it's intended to work ;)
It works differently in the CLI though. Something like "injecthere${butnothere}injecthere" would have the two "injecthere" strings (with injection.combined) parsed as one document, and the ${} block will retain its own highlights. I.e., document combination is attained without range combination. In neovim the whole range gets combined and ${} gets weird highlights
Ah, ok, that was a misunderstanding then. CLI is correct and we are not. We need to convert the range to a region (table of ranges) then.
Yeah. But this is how it is already for non-combined injections. So the only notion of injection combination right now is whether the actual range is contiguous. So for that to work there needs to be some scoping or some other index to refer to different trees of the same language (I think)
Yeah, I think that was an early assumption that was never revisited. I agree that this requires a bigger rewrite, unfortunately. How does the CLI (or, better, Helix) handle this? Do they manually stitch the ranges together (how?) or let the tree-sitter handle this (somehow)?
Not sure, I'll investigate