nearley
nearley copied to clipboard
Grammar does not work when Moo is used - but works without
I'm using the nearley grammar (and parser) with the moo tokeniser. My grammar.ne file is the following:
@{%
const moo = require('moo')
let lexer = moo.compile({
number: /[0-9]+/
});
%}
@lexer lexer
trig -> "sin" [0-9]:+
When parsing the string "sin8" to a parser, via nearley-test grammar.js -i "sin8"
, I get the following error:
throw new Error(this.formatError(token, "invalid syntax"))
^
Error: invalid syntax at line 1 col 1:
sin8
^
at Lexer.next (C:\Users\Florian\WebstormProjects\MineScript\node_modules\moo\moo.js:397:13)
at Parser.feed (C:\Users\Florian\AppData\Roaming\npm\node_modules\nearley\lib\nearley.js:270:30)
at Object.<anonymous> (C:\Users\Florian\AppData\Roaming\npm\node_modules\nearley\bin\nearley-test.js:83:12)
at Module._compile (module.js:652:30)
at Object.Module._extensions..js (module.js:663:10)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:191:16)
However, commenting @lexer lexer
makes it work, and matches the "sin8" string.
This example is taken directly from the documentation and does not work, so I'm wondering where I am wrong. I know I'm missing something because I can't get Moo to work with Nearley correctly.
So when using a custom lexer, string literals are equivalent to using the % notation ? Aka %sin is like "sin" ?
Not quite; "sin"
matches a token with the value sin
, whereas %foo
matches a token with the type foo
.
Sent with GitHawk
# Use %token to match any token of that type instead of "token":
multiplication -> %number %ws %times %ws %number {% ([first, , , , second]) => first * second %}
# Literal strings now match tokens with that text:
trig -> "sin" %number
Sent with GitHawk
I've also been having issues with nearley / moo so I tried to see what is going on with this example.
It was my (now I see incorrect) intuition that the tokeniser would kick in as directed by nearley when a token was specified. For example, as in the example, I expected:
main -> "sin" %number
To match the literal "sin"
and then the token %number
. From what I've got from this post, what "sin"
means changes when a lexer is used. Instead of meaning a literal string, it means any token that has the computed value "sin"
. Thus, if a token is not defined for this context the parser will complain.
Yep I also struggled with this confusion and it's because the Moo lexer runs prior to Nearley (not alongside). So when Moo runs first and gets confused, it will just send these baffling syntax errors back to Nearley. Remember that a lexer is basically a first pass classifier that groups your raw characters and symbols into somewhat usable atoms for the Nearley parser.
So bottom line you probably want to get Moo JS working first separately e.g. make sure all your raw keywords, symbols etc. are not throwing syntax errors and THEN assemble them into higher level constructs using Nearley. Toby Ho has a number of good YouTube videos that show how to get all this working.
Many people dread lexers (including myself) because they think they have to write massive definition files to cover every possible case but things like the "not" operator make it much easier