nearley icon indicating copy to clipboard operation
nearley copied to clipboard

Custom token matchers example fails

Open johndeighan opened this issue 6 years ago • 4 comments

On the page https://nearley.js.org/docs/tokenizers, there is a section 'Custom token matchers'. The example does not work. Here is my custom.ne file, copied verbatim:

@{%
const tokenPrint = { literal: "print" };
const tokenNumber = { test: x => Number.isInteger(x) };
%}

main -> %tokenPrint %tokenNumber ";;"

# parser.feed(["print", 12, ";;"]);

I run nearlyc on it to produce custom.js:

// Generated automatically by nearley, version 2.15.1
// http://github.com/Hardmath123/nearley
(function () {
function id(x) { return x[0]; }

const tokenPrint = { literal: "print" };
const tokenNumber = { test: x => Number.isInteger(x) };
var grammar = {
    Lexer: undefined,
    ParserRules: [
    {"name": "main$string$1", "symbols": [{"literal":";"}, {"literal":";"}], "postprocess": function joiner(d) {return d.join('');}},
    {"name": "main", "symbols": [tokenPrint, tokenNumber, "main$string$1"]}
]
  , ParserStart: "main"
}
if (typeof module !== 'undefined'&& typeof module.exports !== 'undefined') {
   module.exports = grammar;
} else {
   window.grammar = grammar;
}
})();

Then. here is my test.js file, in the same directory, to test the parser:

const nearley = require("nearley");
const grammar = require("./custom.js");

const grammarObj = nearley.Grammar.fromCompiled(grammar);
const parser = new nearley.Parser(grammarObj);
parser.feed(["print", 12, ";;"]);

And the output:

C:\Users\johnd\nearley-test\node_modules\nearley\lib\nearley.js:320
                throw err;
                ^

Error: invalid syntax at index 2
Unexpected ";;"

    at Parser.feed (C:\Users\johnd\nearley-test\node_modules\nearley\lib\nearley.js:317:27)
    at Object.<anonymous> (C:\Users\johnd\nearley-test\test.js:6:8)
    at Module._compile (module.js:653:30)
    at Object.Module._extensions..js (module.js:664:10)
    at Module.load (module.js:566:32)
    at tryModuleLoad (module.js:506:12)
    at Function.Module._load (module.js:498:3)
    at Function.Module.runMain (module.js:694:10)
    at startup (bootstrap_node.js:204:16)
    at bootstrap_node.js:625:3

Tool completed with exit code 1

Note that the example does not define a token for ';;', but instead uses the string literally in the grammar definition. If that would work, I think it would be a good thing because it could greatly simplify the grammar definition. But I think that may be the problem - it doesn't work.

In fact, if I define a tokenEnd:

const tokenEnd = { literal: ";;" };

then use it in the grammar definition:

main -> %tokenPrint %tokenNumber %tokenEnd the input will be parsed correctly. Needing to do that, however, will make my grammar much more difficult to maintain.

johndeighan avatar Oct 29 '18 14:10 johndeighan

Having the same problem here. My lexer is throwing an error before my grammar, causing none of my previous rules to work.

skistaddy avatar Jun 15 '19 04:06 skistaddy

@kach in the docs, a lexer is made out to be something that you can add to look for small things, like primitive types and such. However, it seems like the real job of a lexer is to behave like a parser. Is this an accurate statement?

skistaddy avatar Jun 15 '19 04:06 skistaddy

Ok, I found that in order for the lexer to pass, it just has to know all the literals you're using. So in your case, you would just have to define ";;" in moo or some other compiler.

skistaddy avatar Jun 15 '19 05:06 skistaddy

I think the problem is that a string in a grammar is interpreted as a sequence of single-letter tokens. So

main -> %tokenPrint %tokenNumber ";;"

is equivalent to

main -> %tokenPrint %tokenNumber ";" ";"

Indeed, ["print", 12, ";", ";"] is considered to be correct input.

This behavior is useful when parsing raw strings, but not really useful when giving an array of tokens as input to the parser and parsing it with custom token matchers :confused:

Willem3141 avatar Dec 20 '19 18:12 Willem3141