tree-sitter crashes because of U+FEFF in test file
When running tree-sitter test on this repo, the command will hang and eat more and more memory until it crashes.
In the test case, there is a "ZERO WIDTH NO-BREAK SPACE" (U+FEFF) character before the test code. When removing it, the test passes.
I also tried with tree-sitter-cli compiled from the git master branch instead of from the version published on npmjs.com, but results were identical.
Actually, it turns out that while the U+FEFF is a hard to spot problem, it doesn't have to be something this obscure. I added another test case, which also shows the problem:
============================================
Crashing test
============================================
{}
---
(data)
I haven't had a chance yet to clone this locally and reproduce, but I think you're probably hitting https://github.com/tree-sitter/tree-sitter/issues/98. I just now remembered to change the title of that issue, since it was originally pretty obscure.
I noticed that there's what looks to be an empty string token in your grammar: https://github.com/th-we/tree-sitter-crash-demo/blob/master/grammar.js#L7. Is that intentional? The usual way to write that construct with Tree-sitter would be:
data: $ => $._quoted_value,
_quoted_value: $ => seq('"', optional($.value), '"'),
value: $ => /a[^"]*/
That way, you wouldn't get nodes of zero size.
While it's true that with optional() the problem does not occur, the tree would change, i.e. there would be no value node for the empty value (""). I used to include the quotes in the value nodes, but those nodes should be the injection point for another grammar.
I see the following workarounds:
- Include the quotes in the injected grammar. But that would be a bit like including
<script>tags in the JavaScript grammar. - Introduce a special
emtpy_valuenode i.e. like
It's inconsistent to include the quotes in one case and exclude them in another, but in the grammar cson file it can at least be made sure that the quotes are always highlighted the same.data: $ => choice($._quoted_value, $.empty_value), _quoted_value: $ => seq('"', optional($.value), '"'), value: $ => /a[^"]*/, emtpy_value: $ => '""'
There might be more tree-sitterish ways to solve this.
And by the way: Many thanks for this work of yours. It gives me basic linting for an obscure language with a number of pitfalls, some of which I can catch using tree-sitter now.