sql-formatter icon indicating copy to clipboard operation
sql-formatter copied to clipboard

Nearley integration

Open nene opened this issue 3 years ago • 0 comments

That's an initial attempt at replacing our current parser with Nearley. Just trying it out and seeing what kind of problems arise.

Discovered several problems:

  • [x] Case-sensitivity: Nearley matches against token.text field, which in our case (and in Moo) is the original text in source file. We should change it around, so that token.text would be the canonical/clean value to match against while some other field would store the original string.
  • [x] Semicolons: Currently semicolon is just another operator, should turn it into a separate token type, which would simplify distinguishing it from other operators.
  • [ ] Ambiguity: our current parser simply goes with first rule that matches, while Nearley tries to match all possible rules, leading to ambiguity.

The first two are simple to fix, but the ambiquity problem is a major one.

For example I'd like to parse both function calls and plain blocks of parenthesis. If I write a grammar like this:

main -> expression:*

expression -> function_call | parenthesis | plain_token

function_call -> %IDENT parenthesis

parenthesis -> "(" expression:* ")"

plain_token -> %IDENT | %NUMBER | %STRING | %OPERATOR

and try to parse a string foo(), then Nearley will return two parse results:

  • a function_call
  • a plain_token followed by parenthesis

To solve this problem we would need to write a more complex grammar that only allows IDENT followed by ( inside a function call.

nene avatar Jul 04 '22 21:07 nene