Performance issue
I was trying to use this PEG to parse INI files:
ini <- (comment / section)+ `
lists:append(Node)
`;
section <- space? header space? (comment / config)* space? `
[_,_Header,_,ConfigLines|_] = Node,
[ CL || CL <- ConfigLines, CL =/= ignore ]
`;
header <- '[' (!']' .)+ ']' breakline? ~;
config <- key space? '=' space? (!(breakline / comment) .)* (breakline / comment)? `
[Key,_,_,_,Value|_] = Node,
{Key, iolist_to_binary(Value)}
`;
comment <- space? ';' (!breakline .)* breakline? `
ignore
`;
key <- [a-zA-Z0-9_\.]* `
iolist_to_binary(Node)
`;
space <- [ \t\n\s\r]+ ~;
breakline <- [\n\r]+ ~;
And when I try to use it against a php.ini file with more than 1000 lines (most of them are comment lines), this code spend more than 5 seconds to parse the whole file... finding another solution (eini & zucchini in github.com, those projects use yrl and xrl files) spend less than 1 second to parse the same file, what part of my code is wrong? Thanks.
The php.ini file is here: https://raw.githubusercontent.com/php/php-src/master/php.ini-production
what happens if you comment out your semantic actions? (iolist_to_binary() etc)
@manuel-rubio There are known performance issues with large files and large grammars. Can you profile the parser to see where it is taking the most time? My gut suspicion is that the negative-lookahead+repeat is what's killing it (lots of backtracking).
@seancribbs The performance issue can be alleviated by getting rid of ETS table and explicitly threading through a dict/map.
It would be quite easy to do, except neotoma itself uses memo table for some auxiliary information. For grammars that don't use memo table explicitly, should be an easy transformation though.