tree-sitter
tree-sitter copied to clipboard
🐛 Hidden rule hides single anonymous string literal token
Problem
Lexical precedence is currently achieved by wrapping prec()
with token()
.
The problem is that if the rule is hidden and contains a string literal, it can no longer be captured.
Moreover, it seems to break the string literal when used on its own in other rules.
Solution
The currently available solution (apart from unhiding the node) is aliasing it back to the literal.
I propose introducing a prec.lexical
function which, like prec
, will not mask its contents.
References
https://github.com/tree-sitter-perl/tree-sitter-perl/pull/114#issuecomment-1682420642 https://github.com/nvim-treesitter/nvim-treesitter/pull/5301#issuecomment-1689045251
If it would be more comfortable for you then you may patch JS like:
prec.lex = function (_prec, rule) { return token(prec(_prec, rule)); };
That would be a bodge.
I think improving the docs about lexical precedence is better than introducing another function that, although might be a bit clearer, is unnecessary and does exactly the same thing
Tree-sitter's DSL was build around an idea to be minimal and provides combinable actions where it's possible.
Many grammars add own abstractions on top of base DSL actions.
If it would be more comfortable for you then you may patch JS like:
prec.lex = function (_prec, rule) { return token(prec(_prec, rule)); };
I think you're missing the main issue here. The problem is that in order to use lexical precedence, you need to use the token
directive. This makes TS create a new token for the contents, which you then are unable to match in a query. In order to work around this, you need to then alias the literal token BACK TO ITSELF. I refer to what I wrote here https://github.com/tree-sitter-perl/tree-sitter-perl/pull/114/commits/2098bca9162e672d1e6be78418802d0f52be7f4d
In any case, the workaround that solves the issue would be
prec.lex = function (_prec, rule) { return alias(token(prec(_prec, rule)), rule); };
which smells to me like there should be better handling for this
In order to work around this, you need to then alias the literal token BACK TO ITSELF.
We discussed this with @amaanq and we think it's a bug and it will be fixed.
that is, only if the rule is hidden
It's also a documentation issue. It's only briefly mentioned in the website and not in the prec()
or token()
function documentation. Ideally, the website should also explain why token(prec())
represents lexical precedence.