Running out of memory on a small input?
I don't know what could be going on with this seemingly innocent grammar:
@{%
const mooLexer = require('moo').compile({
space: { match:/[ \t\n]+/, lineBreaks: true },
ident: {
match: /[a-z][a-zA-Z0-9_]*/,
keywords: { keyword: ['fn', 'tfn', 'match', 'ifz', 'forall', 'exists'] }
},
sig: 'SIG',
});
%}
@lexer mooLexer
Main -> _ Tp0 _
Tp1 -> %ident
| "(" _ Tp0 _ ")"
Tp0 -> Tp1
_ -> %space | null
but it appears to run out of memory pretty quickly when given the string "a" to parse, rather than correctly parsing "a" as a valid instance of the grammar.
Full example, check out and run "npm i && npm run test": https://github.com/robsimmons/badgram
Hi! Sorry to hear you're having trouble; thanks for providing details :-)
That's really interesting! I agree your grammar looks normal.
I don't have time to debug further right now; but please could you try Nearley 2.11.1? I'm wondering if it was the change we shipped at the weekend causing you issues.
Sent with GitHawk
Aaaahhh, yes: everything does work with Nearley 2.11.1. If I'd realized it was a recent update I woulda thought to try that. Thanks!
I've confirmed that 2.11.1 works for my larger case as well, and added a slightly larger but more realistic grammar example as a test case in the PR above. (Presumably the PR will not pass testing, since it demonstrates the regression 2.11.1 -> 2.12.1.)
Oh, boy, you're going to love this, Tim: your tokenizer-based compiler compiler is wrong!
$ nearleyc types.ne -o types-2.12.1.js
$ npm uninstall -g nearley
removed 10 packages in 0.169s
$ npm install -g [email protected]
…+ [email protected]
added 9 packages in 1.152s
$ nearleyc types.ne -o types-2.11.1.js
$ diff types-*.js
1c1
< // Generated automatically by nearley, version 2.11.1
---
> // Generated automatically by nearley, version 2.12.1
26c26,27
< {"name": "Tp4", "symbols": [{"literal":"("}, "_", "Tp", "_", {"literal":")"}]},
---
> {"name": "Tp4$subexpression$1", "symbols": ["_", "Tp", "_"]},
> {"name": "Tp4", "symbols": ["Tp4$subexpression$1"]},
It's reading the line Tp4 -> "(" _ Tp _ ")" two different ways… only one of which is correct by my reckoning.
At the very least, let's rollback to the old compiler ASAP. I'm really curious what your tokenizer is doing.
I just tried nearley for the first time today, and was at version 2.19.7.
Some small tests seemed good. So I tried a slightly larger grammar, and it choked. I noticed that removing a few characters would fix the problem (like changing the name of a token from length 5 to length 4).
So, I tested by copy-pasting the same rule a few times and it caused syntax errors. If I remove some copies, it compiles.
I tried to downgrade and repeat this procedure. The only version that could handle my larger grammar without breaking was v2.11.2
My grammar only had one rule,
Identifier ->
%Identifier {% function (_a) {
var identifier = _a[0];
return {
start: identifier.start,
end: identifier.end,
syntaxKind: parser_node_1.SyntaxKind.Identifier,
identifier: identifier.value,
quoted: false,
};
} %}
Then, I copy-pasted that rule 200+ times and nearleyc 2.12.0 until 2.19.7 started complaining about syntax errors.