sql-formatter
sql-formatter copied to clipboard
Nearley integration
That's an initial attempt at replacing our current parser with Nearley. Just trying it out and seeing what kind of problems arise.
Discovered several problems:
- [x] Case-sensitivity: Nearley matches against
token.textfield, which in our case (and in Moo) is the original text in source file. We should change it around, so thattoken.textwould be the canonical/clean value to match against while some other field would store the original string. - [x] Semicolons: Currently semicolon is just another operator, should turn it into a separate token type, which would simplify distinguishing it from other operators.
- [ ] Ambiguity: our current parser simply goes with first rule that matches, while Nearley tries to match all possible rules, leading to ambiguity.
The first two are simple to fix, but the ambiquity problem is a major one.
For example I'd like to parse both function calls and plain blocks of parenthesis. If I write a grammar like this:
main -> expression:*
expression -> function_call | parenthesis | plain_token
function_call -> %IDENT parenthesis
parenthesis -> "(" expression:* ")"
plain_token -> %IDENT | %NUMBER | %STRING | %OPERATOR
and try to parse a string foo(), then Nearley will return two parse results:
- a
function_call - a
plain_tokenfollowed byparenthesis
To solve this problem we would need to write a more complex grammar that only allows IDENT followed by ( inside a function call.