tree-sitter-javascript icon indicating copy to clipboard operation
tree-sitter-javascript copied to clipboard

Inline comments can result in mis-parsed return, break, and continue statements

Open jackschu opened this issue 1 year ago • 0 comments

The following piece of code is valid but it is parsed incorrectly:

function foo() {
    return /**/ 1;
}

Here's a link to the TypeScript Playground showing that the snippet above is valid JavaScript or TypeScript:

https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABMOcAUBKRBvAUIgxAJwFMoQikB6AKhqsQEYBuXAXyA

The output of tree-sitter parse is the following:

(program [0, 0] - [3, 0]
  (function_declaration [0, 0] - [2, 1]
    name: (identifier [0, 9] - [0, 12])
    parameters: (formal_parameters [0, 12] - [0, 14])
    body: (statement_block [0, 15] - [2, 1]
      (return_statement [1, 4] - [1, 10])
      (comment [1, 11] - [1, 15])
      (expression_statement [1, 16] - [1, 18]
        (number [1, 16] - [1, 17])))))

Notice that the expression_statement which points to the text 1 is not a child of the return_statement.

This gets especially weird for the parent's bounding box as in this example the if_statement is selected but is not known to include the return value image

This seems to also affect break and continue statements capturing their labels ie bar here should be considered a label as a statement_identifier but isnt and instead is an expression_statement

function foo() {
	while (true) {
      if(true)
      break /**/ bar
    }
}

It seems like what's happening here is: If you have a statement in a statement block that can end early (ie it ends with optionals). Then an in-line comment can trick the parser into thinking the statement has ended early and leave the now-unaccounted-for optional suffix to be considered distinct expressions statements.

jackschu avatar Jun 18 '24 03:06 jackschu