tree-sitter-dart icon indicating copy to clipboard operation
tree-sitter-dart copied to clipboard

Segmentation Fault and crashed neovim

Open Mofiqul opened this issue 3 years ago • 39 comments
trafficstars

Crashing neovim completely, Segmentation Fault (Core dumped).

Mofiqul avatar May 29 '22 15:05 Mofiqul

+1, I can replicate when attempting to type at the bottom of the file.

rlch avatar May 31 '22 02:05 rlch

Related open issue in neovim

https://github.com/nvim-treesitter/nvim-treesitter/issues/2972

Workaround For now add this commit

f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97

to lockfile.json in nvim-treesitter plugin. And TSInstall dart

genesistms avatar May 31 '22 04:05 genesistms

The issue is in scanner.c somewhere, reverting it to the version at f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97 fixes the crash

Kavantix avatar Jun 01 '22 09:06 Kavantix

I made a quick fork where I changed the version (feel free to use it until it's fixed) https://github.com/RobertBrunhage/nvim-treesitter

RobertBrunhage avatar Jun 06 '22 08:06 RobertBrunhage

Just creating an empty dart file and adding one non-white-space character without a trailing semicolon is enough to cause the segmentation fault. If we write the semicolon first and then insert anything before it the parser won't crash.

As simple as this:

printf "p" > konstant.dart && nvim konstant.dart => [SIGSEV]

printf "p;" > konstant.dart && nvim konstant.dart => [works just fine]

CarlosNihelton avatar Jun 06 '22 19:06 CarlosNihelton

Same issue here: Reverting to the older revision works :)

ErikReider avatar Jun 08 '22 13:06 ErikReider

@UserNobody14 I've played around a bit with the scanner and the only way so far I have been able to prevent a segfault without a semi colon at the end is by changing the last return false; in tree_sitter_dart_external_scanner_scan to return lexer->lookahead == '\0'. This however really sounds like a bug in tree sitter itself we stumbled upon, or it could even be in the grammar.

Kavantix avatar Jun 09 '22 20:06 Kavantix

@UserNobody14 I've played around a bit with the scanner and the only way so far I have been able to prevent a segfault without a semi colon at the end is by changing the last return false; in tree_sitter_dart_external_scanner_scan to return lexer->lookahead == '\0'. This however really sounds like a bug in tree sitter itself we stumbled upon, or it could even be in the grammar.

Hey! I don't have idea of why, but it worked for me as well! @UserNobody14 is this a real fix or would it be just masking some other issue? Unfortunately the commit that introduced the bug is very big to allow attempts to fix it without a deep understanding of tree-sitter and Dart language specification.

CarlosNihelton avatar Jun 10 '22 00:06 CarlosNihelton

Let me try adding that fix and running the grammar on a large list of files again, as far as I know it shouldn't cause a big issue. If it works, I'll merge it

UserNobody14 avatar Jun 11 '22 20:06 UserNobody14

Hmmm. After looking into the problem I can't seem to either replicate it or get the solution you posted to work with our tests. I don't have neovim setup with dart, so I tried running all of the tests I traditionally use, like parsing files and such. Sadly I was not able to recreate any of these issues, which I believe indicates it only happens on reruns of the grammer. Unfortunately without an easy way to replicate it I just don't know what I should do.

Near as I can tell the "lookahead == '\0'" line causes it to return true while the string has not yet ended, but this causes nearly all of our tests to fail, and on other projects' scanner.c files, the return false line is still present, so I don't see why it should be causing such an issue.

I found that this configuration below successfully runs the tests, but without an easy way to see if it segfaults, i have no way of knowing if it actually fixes the problem. Does anyone have some easy steps for reproducing the issue?

                                                  const bool *valid_symbols) {
  // bool ret = false;
  // if (lexer->lookahead == '/') {
  //   return scan_multiline_comments(lexer);
  // }
  if (lexer->lookahead == '\0') {
    return false;
  }
  if (
      valid_symbols[TEMPLATE_CHARS_DOUBLE] ||
      valid_symbols[TEMPLATE_CHARS_SINGLE] ||
      valid_symbols[TEMPLATE_CHARS_DOUBLE_SINGLE] ||
      valid_symbols[TEMPLATE_CHARS_SINGLE_SINGLE]
  ) {
    return scan_templates(lexer, valid_symbols);
  }
  if (valid_symbols[AUTOMATIC_SEMICOLON]) {
      bool ret = scan_automatic_semicolon(lexer);
      return ret;
  }
  while (iswspace(lexer->lookahead)) lexer->advance(lexer, true);

  if (lexer->lookahead == '/') {
    return scan_multiline_comments(lexer);
  }
  return false;
}```

UserNobody14 avatar Jun 11 '22 21:06 UserNobody14

You don't need a Neovim setup. Just let the tree-sitter parser run. Instructions https://github.com/ikatyang/tree-sitter-markdown/issues/14 (CC needs to be clang and python in the script python2). I've provided sample inputs that let the parser crash in the original issue, but I would recommend running the fuzzer (maybe even on CI)

theHamsta avatar Jun 11 '22 23:06 theHamsta

If this is really a bug in tree-sitter, you could have a look at our open issue with tree-sitter-prisma which crashes without a custom parser @Kavantix . There is even a closed issue for tree-sitter-pug which hd the same issue that could be fixed by a check for \0

theHamsta avatar Jun 11 '22 23:06 theHamsta

@theHamsta could you link the issues you mentioned?

@UserNobody14 I reproduced it by running tree-sitter highlight on a file containing a few random characters and no semicolon at the end. E.g. dhdh. So no need for using vim while testing

Kavantix avatar Jun 12 '22 09:06 Kavantix

@Kavantix here you go: https://github.com/nvim-treesitter/nvim-treesitter/issues/2972#issuecomment-1141491120 As you say, no need to use vim. I bisected using git rebase -exec '...' using the setup in https://github.com/ikatyang/tree-sitter-markdown/issues/14 . There are some problems with the queries in this repo, so I had to modify the fuzzing main a bit to skip checking the queries. (basically skip all usages of https://github.com/tree-sitter/tree-sitter/blob/b729029a403046375a77a8e320211da64da00629/test/fuzz/fuzzer.cc#L7 in this file

I should really open a PR to document how to setup fuzzing in upstream tree-sitter. We found quite a lot parser issue with tree-sitter's fuzzing setup.

theHamsta avatar Jun 13 '22 22:06 theHamsta

Is the current workaround is the one offered by @GenesisTMS , to revert to SHA f95876f0ed3ef207bbd3c5c41987bc2e9cecfc97 of this repo?

This is a manual step so all new users of Neovim + TS + Dart will have to perform it on their own. Would it be possible to revert the changes done on this repo to that SHA instead?

https://github.com/nvim-treesitter/nvim-treesitter/issues/2972 which was referenced earlier as the matching TS issue is now marked as closed.

ghost avatar Jul 07 '22 06:07 ghost

Coming to this from nvim-treesitter.

Adding the following to test/corpus/dart.txt replicates the issue with tree-sitter test:

=====================
Weird file
=====================


d

---------------

This should at least help debugging the test.

The reason of the segfault is still a mistery to me, and the scanner.c is cryptic to me, especially scan_templates which does some kind of weird loop things.

vigoux avatar Jul 07 '22 10:07 vigoux

Changing the scanner to always return false immediately gives the same crash, so it is not because of the scanner

Kavantix avatar Jul 07 '22 10:07 Kavantix

Now that's interesting...

vigoux avatar Jul 07 '22 10:07 vigoux

Any chance to revert back to the working commit until the problem is solved?

rphuber avatar Jul 08 '22 10:07 rphuber

nvim-treesitter now uses an older working version for now: https://github.com/nvim-treesitter/nvim-treesitter/commit/d99b4cdd7591ab2e3ff3f962f7cc518908375805

theHamsta avatar Jul 13 '22 19:07 theHamsta

@theHamsta a fuzzing setup guide would be tremendously appreciated, I'd love to fuzz my grammar.

ahelwer avatar Jul 13 '22 22:07 ahelwer

@ahelwer Can't count how often I linked this post https://github.com/ikatyang/tree-sitter-markdown/issues/14#issue-725850717

theHamsta avatar Jul 13 '22 22:07 theHamsta

@theHamsta maybe we could host a wiki page for this, although the comment is good, debugging the error is a bit more complicated than just running the fuzzer :)

kyazdani42 avatar Jul 16 '22 09:07 kyazdani42

Ideally such a documentation is contributed to tree-sitter upstream. On how to create safe, fuzzed parsers.

theHamsta avatar Jul 23 '22 18:07 theHamsta

I guess so. I will look at this next week

kyazdani42 avatar Jul 24 '22 11:07 kyazdani42

Alright I got a commit working with @Kavantix 's highlight dhdh file as well as @vigoux 's "weird file" test (thanks btw!). Unfortunately it's still hard for me to be sure it won't crash again. Let me know if it keeps happening with the latest changes.

I'm not a genius user of scanner.c, sadly, I just kinda copied and pasted from various other tree sitter repos until I got it working. If anyone knowledgeable of C and tree-sitter wants to take a crack at it, I'd be happy to assist/merge your commits.

UserNobody14 avatar Aug 01 '22 22:08 UserNobody14

Seems to be ok now. Fuzzed for 55mins. Can the others please confirm that this is gone now also in Neovim?

theHamsta avatar Aug 01 '22 23:08 theHamsta

I can confirm that this issue has been fixed on neovim, although it seems that the speed for the first load is back to normal

fitrh avatar Aug 02 '22 02:08 fitrh

also, the flutter project does not crash now for me

ayoubelmhamdi avatar Aug 02 '22 03:08 ayoubelmhamdi

For me the crash still happens on 53485a8f301254e19c518aa20c80f1bcf7cf5c62

Kavantix avatar Aug 31 '22 07:08 Kavantix