fix: empty _command_list_body and list_body
Make it easy for https://github.com/blindFS/topiary-nushell/issues/39#issuecomment-3568844310
Also fixes a minor issue of empty command_list
Although this PR introduces 6 more conflicts, it hardly hurts lex state counts and WASM comp time, should be safe to merge.
@mkatychev, sorry to ping you on an irrelevant PR. I'm confused at the failed CI pipeline, do you happen to know what's going on?
I'll look more into it tomorrow but I've encountered similar messages and it was an issue with the package-lock.json
@blindFS I finally narrowed down the problem, until https://github.com/tree-sitter/node-tree-sitter/pull/258 is merged, node isn't compatible with 0.25 basically since there's breaking ABI changes that haven't been addressed.
So either temporarily drop node support or temporarily downgrade the ABI.
@blindFS I finally narrowed down the problem, until tree-sitter/node-tree-sitter#258 is merged, node isn't compatible with 0.25 basically since there's breaking ABI changes that haven't been addressed.
So either temporarily drop node support or temporarily downgrade the ABI.
Thanks a lot, I'll disable test-node for now.
BTW, it will be super helpful if you review this PR in your spare time.
A lot of conflicts for this minor edge case is kinda silly, but I really run out of ideas here.
The basic idea behind this PR is that, previously, there's a list_body node in empty lists like
[
,
]
which covers the commas and newlines between brackets, it's somewhat annoying for topiary, so I make this kind of empty body contents anonymous in this PR.
The general_body_rule function helps to generate such kind of repeated patterns with separators in between, especially helpful when extra newlines are allowed.
ping me when you're ready to land it.
ping me when you're ready to land it.
Sure
@blindFS I'll post my general comments initially:
- getting rid of the
general_body_rulesfunction would help a lot with indirection - adding newlines to
extraswould remove the need for most of this PR (ignoring regressions in other rules ofc)[^1].
I managed to greatly simplify the $._collection_body node while adding additional support for
commas inside of collection types without regressions in https://github.com/nushell/tree-sitter-nu/pull/235:
- https://github.com/nushell/tree-sitter-nu/blob/32cd5db1b3092636249b2df4a9369201c8b13174/grammar.js#L316-L316
In general handling whitespace explicitly (the choice(punc().comma, /\s/) below) does not need to be done as tree sitter nodes[^0] consider whitespace delimitation between nodes as opt-out [^1]:
- https://github.com/nushell/tree-sitter-nu/blob/47d4b4f5369c0cae866724758ae88ef07e10e4f1/grammar.js#L1106-L1106
With the whitespace mention above general_body_rules does not feel justified and removing it would greatly simplify a lot of the rulesets (in my opinion).
Ideally a list node should have a simple definition:
list: ($) => seq('[', repeat1(choice( ',', $._list_entry, '\n'))),
...this could further be reduced by keeping newlines as part of extras
and having their presence define rules[^2] (such as end of function or statement) be explicitly opt in: extras: ($) => [/s/, $.comment],
[^0]: named or anonymous (like ,)
[^1]: https://tree-sitter.github.io/tree-sitter/creating-parsers/3-writing-the-grammar.html#using-extras
[^2]: token.immediate(rule)
@blindFS I finally narrowed down the problem, until tree-sitter/node-tree-sitter#258 is merged, node isn't compatible with 0.25 basically since there's breaking ABI changes that haven't been addressed. So either temporarily drop node support or temporarily downgrade the ABI.
Thanks a lot, I'll disable
test-nodefor now.BTW, it will be super helpful if you review this PR in your spare time.
A lot of conflicts for this minor edge case is kinda silly, but I really run out of ideas here.
The basic idea behind this PR is that, previously, there's a
list_bodynode in empty lists like[ , ]which covers the commas and newlines between brackets, it's somewhat annoying for topiary, so I make this kind of empty body contents anonymous in this PR.
The
general_body_rulefunction helps to generate such kind of repeated patterns with separators in between, especially helpful when extra newlines are allowed.
In topiary's case I think turning an [ , ] into [] ( through @delete) would be reasonable (I'm assuming that's what you're trying to do) and should not conflict with some changes.
@mkatychev Thanks a lot for the reviewing. The removing of newline in extras #139 does seem to cause a lot of trouble.
As I recall, it mainly solves the issue of multiline binary-op.
That issue might be solvable with some careful tuning of precedence. I'll try it on the old grammar.js before #139 but I feel pessimistic about it. I'll let you know if I run into troubles.
Please let me know if you have issues, you can generally get away with this kind of thing without resorting to scanner.c so long as you don't maintain state (unlike something like python where indentation is "stateful").
Ideally a list node should have a simple definition: list: ($) => seq('[', repeat1(choice( ',', $._list_entry, '\n'))),
You mean getting rid of list_body? That will be a breaking change affecting many downstream projects. And we should do the same for all xxx_body nodes.
@mkatychev Oh, I think I remember what was the problem:
There's a fundamental difference between
(foo
bar # bar is the argument of command foo
)
foo
bar # a different command
It seems the only way to differentiate them is to specify newlines in an explicit way.
Two possibilities:
Explicit \n in parenthesized rules to force continuation of the matching of current pipe-element
That's what it looked like before #139, It makes everything with higher precedence difficult, for example:
(
ls | where $it.name != "foo"
and $it.name != "bar"
)
Explicit \n in non-parenthesized blocks as a terminator between piplelines
This probably is what you would expect. However I have trouble making this work as expected
1
# comment
| $in
The best bet I'm aware of is moving the $._pipe_separator rule to scanner.c so it can have a higher precedence than the \n in $._terminator, but still the comments in-between are pretty annoying. Any idea?
I think moving terminator to the external scanner is a simple and feasible way. Closing this for now.
I see your case, I think the terminator case is most similar to python/justfile indentation:
Hi @mkatychev, I've managed to fix most of the _terminator related problems mentioned above in this branch
Then I found another trickier problem:
To prevent echo 192.168.0.1 from being parsed as a command with 4 numerical arguments.
I think I don't have much options but to put explicit spaces as separators between command arguments.
(Moving unquoted rules to ex scanner might be another option, but that's huge amount of work, bc all conflicted rules must be moved too)
However the spaces introduced in explicit JS rules always have higher precedence than those of extras, even with token(prec(lowest_num, /[ \t]+/)). (I think tree-sitter should allow finer-grained control over that kind of stuff.)
Then the following code will be parsed incorrectly, as a block with single command of 3 string arguments:
{
print foo
print foo
}
BC the spaces before the 2nd print trick the parser to continue with the command_rule before trying to match the newline with _terminator.
Would you mind elaborating on echo 192.168.0.1? I'm confused as to how full stops are involved
Would you mind elaborating on
echo 192.168.0.1? I'm confused as to how full stops are involved
Actually I'm not quite sure how it ended up like that (after conflict resolving), but the parser thought 192, .168, .0 and .1 are 4 number arguments.