nearley icon indicating copy to clipboard operation
nearley copied to clipboard

Running out of memory on a small input?

Open robsimmons opened this issue 8 years ago • 5 comments

I don't know what could be going on with this seemingly innocent grammar:

@{%

const mooLexer = require('moo').compile({
    space: { match:/[ \t\n]+/, lineBreaks: true },
    ident: {
        match: /[a-z][a-zA-Z0-9_]*/,
        keywords: { keyword: ['fn', 'tfn', 'match', 'ifz', 'forall', 'exists'] }
    },
    sig: 'SIG',
});

%}

@lexer mooLexer

Main    -> _ Tp0 _

Tp1     -> %ident
         | "(" _ Tp0 _ ")"

Tp0      -> Tp1

_       -> %space | null

but it appears to run out of memory pretty quickly when given the string "a" to parse, rather than correctly parsing "a" as a valid instance of the grammar.

Full example, check out and run "npm i && npm run test": https://github.com/robsimmons/badgram

robsimmons avatar Mar 06 '18 01:03 robsimmons

Hi! Sorry to hear you're having trouble; thanks for providing details :-)

That's really interesting! I agree your grammar looks normal.

I don't have time to debug further right now; but please could you try Nearley 2.11.1? I'm wondering if it was the change we shipped at the weekend causing you issues.

Sent with GitHawk

tjvr avatar Mar 06 '18 09:03 tjvr

Aaaahhh, yes: everything does work with Nearley 2.11.1. If I'd realized it was a recent update I woulda thought to try that. Thanks!

robsimmons avatar Mar 06 '18 12:03 robsimmons

I've confirmed that 2.11.1 works for my larger case as well, and added a slightly larger but more realistic grammar example as a test case in the PR above. (Presumably the PR will not pass testing, since it demonstrates the regression 2.11.1 -> 2.12.1.)

robsimmons avatar Mar 06 '18 19:03 robsimmons

Oh, boy, you're going to love this, Tim: your tokenizer-based compiler compiler is wrong!

$ nearleyc types.ne -o types-2.12.1.js
$ npm uninstall -g nearley
removed 10 packages in 0.169s
$ npm install -g [email protected]
…+ [email protected]
added 9 packages in 1.152s
$ nearleyc types.ne -o types-2.11.1.js
$ diff types-*.js
1c1
< // Generated automatically by nearley, version 2.11.1
---
> // Generated automatically by nearley, version 2.12.1
26c26,27
<     {"name": "Tp4", "symbols": [{"literal":"("}, "_", "Tp", "_", {"literal":")"}]},
---
>     {"name": "Tp4$subexpression$1", "symbols": ["_", "Tp", "_"]},
>     {"name": "Tp4", "symbols": ["Tp4$subexpression$1"]},

It's reading the line Tp4 -> "(" _ Tp _ ")" two different ways… only one of which is correct by my reckoning.

At the very least, let's rollback to the old compiler ASAP. I'm really curious what your tokenizer is doing.

kach avatar Mar 07 '18 00:03 kach

I just tried nearley for the first time today, and was at version 2.19.7.

Some small tests seemed good. So I tried a slightly larger grammar, and it choked. I noticed that removing a few characters would fix the problem (like changing the name of a token from length 5 to length 4).

So, I tested by copy-pasting the same rule a few times and it caused syntax errors. If I remove some copies, it compiles.

I tried to downgrade and repeat this procedure. The only version that could handle my larger grammar without breaking was v2.11.2


My grammar only had one rule,

Identifier ->
    %Identifier {% function (_a) {
    var identifier = _a[0];
    return {
        start: identifier.start,
        end: identifier.end,
        syntaxKind: parser_node_1.SyntaxKind.Identifier,
        identifier: identifier.value,
        quoted: false,
    };
} %}

Then, I copy-pasted that rule 200+ times and nearleyc 2.12.0 until 2.19.7 started complaining about syntax errors.

AnyhowStep avatar Oct 22 '20 00:10 AnyhowStep