nearley
nearley copied to clipboard
Custom token matchers example fails
On the page https://nearley.js.org/docs/tokenizers, there is a section 'Custom token matchers'. The example does not work. Here is my custom.ne file, copied verbatim:
@{%
const tokenPrint = { literal: "print" };
const tokenNumber = { test: x => Number.isInteger(x) };
%}
main -> %tokenPrint %tokenNumber ";;"
# parser.feed(["print", 12, ";;"]);
I run nearlyc on it to produce custom.js:
// Generated automatically by nearley, version 2.15.1
// http://github.com/Hardmath123/nearley
(function () {
function id(x) { return x[0]; }
const tokenPrint = { literal: "print" };
const tokenNumber = { test: x => Number.isInteger(x) };
var grammar = {
Lexer: undefined,
ParserRules: [
{"name": "main$string$1", "symbols": [{"literal":";"}, {"literal":";"}], "postprocess": function joiner(d) {return d.join('');}},
{"name": "main", "symbols": [tokenPrint, tokenNumber, "main$string$1"]}
]
, ParserStart: "main"
}
if (typeof module !== 'undefined'&& typeof module.exports !== 'undefined') {
module.exports = grammar;
} else {
window.grammar = grammar;
}
})();
Then. here is my test.js file, in the same directory, to test the parser:
const nearley = require("nearley");
const grammar = require("./custom.js");
const grammarObj = nearley.Grammar.fromCompiled(grammar);
const parser = new nearley.Parser(grammarObj);
parser.feed(["print", 12, ";;"]);
And the output:
C:\Users\johnd\nearley-test\node_modules\nearley\lib\nearley.js:320
throw err;
^
Error: invalid syntax at index 2
Unexpected ";;"
at Parser.feed (C:\Users\johnd\nearley-test\node_modules\nearley\lib\nearley.js:317:27)
at Object.<anonymous> (C:\Users\johnd\nearley-test\test.js:6:8)
at Module._compile (module.js:653:30)
at Object.Module._extensions..js (module.js:664:10)
at Module.load (module.js:566:32)
at tryModuleLoad (module.js:506:12)
at Function.Module._load (module.js:498:3)
at Function.Module.runMain (module.js:694:10)
at startup (bootstrap_node.js:204:16)
at bootstrap_node.js:625:3
Tool completed with exit code 1
Note that the example does not define a token for ';;', but instead uses the string literally in the grammar definition. If that would work, I think it would be a good thing because it could greatly simplify the grammar definition. But I think that may be the problem - it doesn't work.
In fact, if I define a tokenEnd:
const tokenEnd = { literal: ";;" };
then use it in the grammar definition:
main -> %tokenPrint %tokenNumber %tokenEnd
the input will be parsed correctly. Needing to do that, however, will make my grammar much more difficult to maintain.
Having the same problem here. My lexer is throwing an error before my grammar, causing none of my previous rules to work.
@kach in the docs, a lexer is made out to be something that you can add to look for small things, like primitive types and such. However, it seems like the real job of a lexer is to behave like a parser. Is this an accurate statement?
Ok, I found that in order for the lexer to pass, it just has to know all the literals you're using. So in your case, you would just have to define ";;"
in moo or some other compiler.
I think the problem is that a string in a grammar is interpreted as a sequence of single-letter tokens. So
main -> %tokenPrint %tokenNumber ";;"
is equivalent to
main -> %tokenPrint %tokenNumber ";" ";"
Indeed, ["print", 12, ";", ";"]
is considered to be correct input.
This behavior is useful when parsing raw strings, but not really useful when giving an array of tokens as input to the parser and parsing it with custom token matchers :confused: