nearley icon indicating copy to clipboard operation
nearley copied to clipboard

Grammar does not work when Moo is used - but works without

Open TheMrZZ opened this issue 6 years ago • 6 comments

I'm using the nearley grammar (and parser) with the moo tokeniser. My grammar.ne file is the following:

@{%
    const moo = require('moo')
    let lexer = moo.compile({
        number: /[0-9]+/
    });
%}

@lexer lexer

trig -> "sin" [0-9]:+

When parsing the string "sin8" to a parser, via nearley-test grammar.js -i "sin8", I get the following error:

throw new Error(this.formatError(token, "invalid syntax"))
      ^

Error: invalid syntax at line 1 col 1:

  sin8
  ^
    at Lexer.next (C:\Users\Florian\WebstormProjects\MineScript\node_modules\moo\moo.js:397:13)
    at Parser.feed (C:\Users\Florian\AppData\Roaming\npm\node_modules\nearley\lib\nearley.js:270:30)
    at Object.<anonymous> (C:\Users\Florian\AppData\Roaming\npm\node_modules\nearley\bin\nearley-test.js:83:12)
    at Module._compile (module.js:652:30)
    at Object.Module._extensions..js (module.js:663:10)
    at Module.load (module.js:565:32)
    at tryModuleLoad (module.js:505:12)
    at Function.Module._load (module.js:497:3)
    at Function.Module.runMain (module.js:693:10)
    at startup (bootstrap_node.js:191:16)

However, commenting @lexer lexer makes it work, and matches the "sin8" string. This example is taken directly from the documentation and does not work, so I'm wondering where I am wrong. I know I'm missing something because I can't get Moo to work with Nearley correctly.

TheMrZZ avatar Oct 28 '18 15:10 TheMrZZ

You haven't declared a rule for sin in your Tokenizer. 🙂

Sent with GitHawk

tjvr avatar Oct 28 '18 21:10 tjvr

So when using a custom lexer, string literals are equivalent to using the % notation ? Aka %sin is like "sin" ?

TheMrZZ avatar Oct 30 '18 04:10 TheMrZZ

Not quite; "sin" matches a token with the value sin, whereas %foo matches a token with the type foo.

Sent with GitHawk

tjvr avatar Oct 30 '18 08:10 tjvr

# Use %token to match any token of that type instead of "token":
multiplication -> %number %ws %times %ws %number {% ([first, , , , second]) => first * second %}

# Literal strings now match tokens with that text:
trig -> "sin" %number

Sent with GitHawk

tjvr avatar Oct 30 '18 08:10 tjvr

I've also been having issues with nearley / moo so I tried to see what is going on with this example.

It was my (now I see incorrect) intuition that the tokeniser would kick in as directed by nearley when a token was specified. For example, as in the example, I expected:

main -> "sin" %number

To match the literal "sin" and then the token %number. From what I've got from this post, what "sin" means changes when a lexer is used. Instead of meaning a literal string, it means any token that has the computed value "sin". Thus, if a token is not defined for this context the parser will complain.

Zemnmez avatar Dec 29 '18 19:12 Zemnmez

Yep I also struggled with this confusion and it's because the Moo lexer runs prior to Nearley (not alongside). So when Moo runs first and gets confused, it will just send these baffling syntax errors back to Nearley. Remember that a lexer is basically a first pass classifier that groups your raw characters and symbols into somewhat usable atoms for the Nearley parser.

So bottom line you probably want to get Moo JS working first separately e.g. make sure all your raw keywords, symbols etc. are not throwing syntax errors and THEN assemble them into higher level constructs using Nearley. Toby Ho has a number of good YouTube videos that show how to get all this working.

Many people dread lexers (including myself) because they think they have to write massive definition files to cover every possible case but things like the "not" operator make it much easier

ProjectAtlantis-dev avatar Mar 27 '20 14:03 ProjectAtlantis-dev