tree-sitter-javascript icon indicating copy to clipboard operation
tree-sitter-javascript copied to clipboard

Fix: Comment optimization should not cause automatic-semi insert for single-line block

Open jackschu opened this issue 1 year ago • 0 comments

Problem

Today the following piece of code is valid but it is parsed incorrectly:

class C {
  name/*comment*/() {
  }
}

Here's a link to the TypeScript Playground showing that the snippet above is valid JavaScript or TypeScript:

The output of tree-sitter parse is the following:

program [0, 0] - [4, 0]
  class_declaration [0, 0] - [3, 1]
    name: identifier [0, 6] - [0, 7]
    body: class_body [0, 8] - [3, 1]
      member: field_definition [1, 2] - [1, 6]
        property: property_identifier [1, 2] - [1, 6]
      comment [1, 6] - [1, 17]
      member: method_definition [1, 17] - [2, 3]
        name: MISSING property_identifier [1, 17] - [1, 17]
        parameters: formal_parameters [1, 17] - [1, 19]
        body: statement_block [1, 20] - [2, 3]

This is because name/*comment*/() { was being parsed as having an automatic semicolon between name and (), but this isn't right.

This was because there was an optimization that would cause automatic semicolons to be considered valid if a comment was started. This overlooked single line block comments.


Diff summary

Turns out the problem was really nuanced:

There are two callers of scan_whitespace_and_comments,

  1. an optimizing caller who wants to decide that having seen a comment is enough to consider auto-semi-insertion legal.
  2. another who just wants to march the lexer forward maximally.

This PR basically gives the ability for the first of these to be invoked (ie because the automatic semicolon scanner saw a comment) and not result in a decision on semicolon-legality in particularly the case where only a single-line-block-comment is seen.

It does this by

  1. changing scan_whitespace_and_comments to have an enum return value to indicate its three results (no-decision, reject, accept)
  2. giving scan_whitespace_and_comments a consume:false mode where it stops as soon as it pops out of (a series of) block comments

Also added tests to the corpus

jackschu avatar Jun 30 '24 20:06 jackschu