nearley
nearley copied to clipboard
Very slow performance for lua-like grammer
Firstly - nearley is awesome 👍🏽, I was able to create a working grammer in no time at all! The biggest problem so far has been performance.
I'm parsing a lua-like grammer - it's almost identical to the example one, but I added a few extra features, and I'm using moo
as my lexer.
It's taking around 26 seconds to parse the input [.
- which is an invalid grammer. I'm taking a user's input, so sometimes they enter invalid stuff like that. Is there a way to speed this up, or how would I got about debugging this performance issue?
real 0m26.580s
user 0m18.580s
sys 0m1.570s
This is the output of the debug:
Error: Syntax error at line 1 col 2:
[.
^
Unexpected dot token: ".". Instead, I was expecting to see one of the following:
A rbracket token based on:
Array → %lbracket _ ● %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A ws token based on:
_ → _ ● %ws
Array → %lbracket ● _ ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A not token based on:
ExpUnary → ● %not __ ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A bang token based on:
ExpUnary → ● %bang ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A minus token based on:
ExpUnary → ● %minus _ ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A string token based on:
Atom → ● %string
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A boolean token based on:
Atom → ● %boolean
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A number token based on:
Atom → ● %number
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A lbracket token based on:
Array → ● %lbracket _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A lbracket token based on:
Array → ● %lbracket _ ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A lcurly token based on:
Object → ● %lcurly _ %rcurly
Atom → ● Object
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A lcurly token based on:
Object → ● %lcurly _ KeyValueList _ %rcurly
Atom → ● Object
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A lparen token based on:
Parenthesized → ● %lparen _ Exp _ %rparen
Atom → ● Parenthesized
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A name token based on:
ExpName → ● %name
Var → ● ExpName
PrefixExp → ● Var
Atom → ● PrefixExp
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A reference token based on:
Reference → ● %reference
ExpName → ● Reference
Var → ● ExpName
PrefixExp → ● Var
Atom → ● PrefixExp
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
A reference token based on:
Reference → ● %reference _ ReferenceSelectorList
ExpName → ● Reference
Var → ● ExpName
PrefixExp → ● Var
Atom → ● PrefixExp
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
ExpList → ● Exp
Array → %lbracket _ ● ExpList _ %rbracket
Atom → ● Array
ExpPow → ● Atom
ExpUnary → ● ExpPow
ExpProduct → ● ExpUnary
ExpSum → ● ExpProduct
ExpConcatenation → ● ExpSum _ %concat _ ExpConcatenation
ExpComparison → ● ExpConcatenation
ExpAnd → ● ExpComparison
ExpOr → ● ExpAnd
Exp → ● ExpOr
So I've found the part of the code that was taking a long time - it's actually spending almost all of the time creating the error "state stack". If I comment out the code (as below), the timing improves to 0.125s (from ~26 seconds) - not a bad improvement!
I actually only care about the error position (which I still get when commenting out the below). Maybe we can add an option to skip this extra error information? - as it takes a very long time to build.
real 0m0.125s
user 0m0.088s
sys 0m0.031s
Parser.prototype.reportError = function(token) {
var lines = [];
var tokenDisplay = (token.type ? token.type + " token: " : "") + JSON.stringify(token.value !== undefined ? token.value : token);
lines.push(this.lexer.formatError(token, "Syntax error"));
lines.push('Unexpected ' + tokenDisplay + '. Instead, I was expecting to see one of the following:\n');
var lastColumnIndex = this.table.length - 2;
var lastColumn = this.table[lastColumnIndex];
var expectantStates = lastColumn.states
.filter(function(state) {
var nextSymbol = state.rule.symbols[state.dot];
return nextSymbol && typeof nextSymbol !== "string";
});
// // Display a "state stack" for each expectant state
// // - which shows you how this state came to be, step by step.
// // If there is more than one derivation, we only display the first one.
// var stateStacks = expectantStates
// .map(function(state) {
// var stacks = this.buildStateStacks(state, []);
// return stacks[0];
// }, this);
// // Display each state that is expecting a terminal symbol next.
// stateStacks.forEach(function(stateStack) {
// var state = stateStack[0];
// var nextSymbol = state.rule.symbols[state.dot];
// var symbolDisplay = this.getSymbolDisplay(nextSymbol);
// lines.push('A ' + symbolDisplay + ' based on:');
// this.displayStateStack(stateStack, lines);
// }, this);
lines.push("");
return lines.join("\n");
};
Hmm interesting, it appears the fancy error reporting introduced some kind of performance regression? @airportyh do you know what's wrong?
Wow, that's definitely not intentional. @calummoore would you mind giving me the grammar you are using that exhibited this performance penalty? I'd like to look at it.
I am seeing a very similar issue where reportError takes 20 seconds to build the list of expected tokens, but for a different, probably much simpler grammar.
I am using the following grammar:
@lexer lexer
exp -> boolexp
boolexp ->
boolexp %And comp
| boolexp %Or comp
| comp
comp ->
comp %Equal sum
| comp %NotEqual sum
| comp %LessThan sum
| comp %LessThanEqual sum
| comp %GreaterThan sum
| comp %GreaterThanEqual sum
| sum
sum ->
sum %Plus product
| sum %Minus product
| product
product ->
product %Star unaryexp
| product %Slash unaryexp
| unaryexp
unaryexp ->
%Minus unaryexp
| %Not unaryexp
| atomicexp
atomicexp ->
intconst
# TODO: variables
# TODO: call
| %LBrace boolexp %RBrace
intconst -> %IntegerConst
As I am using a custom lexer, I guess you can't directly use it, but I hope the overall structure will be sufficient to debug this issue.
Furthermore, here is a perftrace from the Chrome DevTools
and the raw JSON report here which can be opened by Chrome here: slow-reportError.zip
From here on, I am mostly guessing as I am not familiar with Nearley or the error reporting algorithm. To me, this looks like an issue with too much recursion/too large fanout. buildParseStack calls itself recursively with a maximum call depth of ~15. The way I wrote my rules, I am probably introducing a large fanout on every recursion level. E.g., assuming that the error position is before the first token, I guess that
comp %Equal sum
| comp %NotEqual sum
| comp %LessThan sum
| comp %LessThanEqual sum
| comp %GreaterThan sum
| comp %GreaterThanEqual sum
| sum
will translate into 7 rules and buildStacks calls itself recursively for each one of them, even though the first 6 rules would all result in the same valid tokens. In the error message, this duplication is not present anymore, so I guess a later step removes duplicated rules again.
FYI: The error message in my case is
A Minus token based on:
unaryexp → ● %Minus unaryexp
product → ● unaryexp
sum → ● product
comp → ● sum
boolexp → ● comp
exp → ● boolexp
A Not token based on:
unaryexp → ● %Not unaryexp
product → ● unaryexp
sum → ● product
comp → ● sum
boolexp → ● comp
exp → ● boolexp
A LBrace token based on:
atomicexp → ● %LBrace boolexp %RBrace
unaryexp → ● atomicexp
product → ● unaryexp
sum → ● product
comp → ● sum
boolexp → ● comp
exp → ● boolexp
A IntegerConst token based on:
intconst → ● %IntegerConst
atomicexp → ● intconst
unaryexp → ● atomicexp
product → ● unaryexp
sum → ● product
comp → ● sum
boolexp → ● comp
exp → ● boolexp
A FloatConst token based on:
floatconst → ● %FloatConst
atomicexp → ● floatconst
unaryexp → ● atomicexp
product → ● unaryexp
sum → ● product
comp → ● sum
boolexp → ● comp
exp → ● boolexp
A StringConst token based on:
stringconst → ● %StringConst
atomicexp → ● stringconst
unaryexp → ● atomicexp
After slightly refactoring my grammar, buildStateStacks
now executes in less than 1 millisecond.
Restructuring the grammar in the following way works around the issues
@lexer lexer
exp -> boolexp
boolexp ->
boolexp boolop comp
| comp
comp ->
comp compop sum
| sum
sum ->
sum sumop product
| product
product ->
product productop unaryexp
| unaryexp
unaryexp ->
unaryop unaryexp
| atomicexp
atomicexp ->
intconst
| floatconst
| stringconst
| %LBrace boolexp %RBrace
boolop -> %And | %Or
compop -> %Equal | %NotEqual | %LessThan | %LessThanEqual | %GreaterThan | %GreaterThanEqual
sumop -> %Plus | %Minus
productop -> %Star | %Slash
unaryop -> %Minus %Not
intconst -> %IntegerConst
floatconst -> %FloatConst
stringconst -> %StringConst