grammars-v4
grammars-v4 copied to clipboard
[css3] Grammar very slow due to numerous ambiguities
See https://groups.google.com/g/antlr-discussion/c/8KQC7fwcXpQ
Using the latest ANTLR 4.13.2, with the official example repository's CSS3 grammar (https://github.com/antlr/grammars-v4/tree/master/css3), I generated a C++ parser to parse a simple CSS file of about 200 lines. Without adding any custom listener logic, on an M1 MacBook Pro, it takes 1 second for the first run! I'll consider that as a warm-up, but even after that, it takes 0.14 seconds each time! In contrast, my own homemade approach relying purely on string find only takes 0.002 seconds. Why is there such a performance difference? Is it because I'm using it incorrectly?
As with many of the grammars in the grammars-v4 repo, you never know what to expect.
The problem is that the css3 grammar is ambiguous. To debug the grammar, you can use "trperf" for a quick explanation of the ambiguity or "trparse --ambig" for a slow but detailed explanation of the ambiguity. These tools are from the Trash toolkit. https://github.com/kaby76/Trash. For "trparse --ambig", I had to pare down the input to a few hundred lines, otherwise it would take too long.
To name a few ambiguities, we have Decisions 55 in "ruleSet", 60 in "declaration", 98 in "fontFaceDeclaration", and 112 in "ws".
"ws" is particularly egregious because it is the most time-consuming expression. Using "trparse --ambig", and Bash diff, I found these rules to be a problem.
expr : term (operator_? term)* ;
term : number ws # knownTerm | percentage ws # knownTerm | dimension ws # knownTerm | String_ ws # knownTerm | UnicodeRange ws # knownTerm | ident ws # knownTerm | var_ # knownTerm | url ws # knownTerm | hexcolor # knownTerm | calc # knownTerm | function_ # knownTerm | unknownDimension ws # unknownTerm | dxImageTransform # badTerm ;
operator_ : '/' ws # goodOperator | Comma ws # goodOperator | Space ws # goodOperator | '=' ws # badOperator // IE filter and DXImageTransform function ;
For partial input
abbr[title] { border-bottom: 1px dotted; }
we see two parses for "1px",
examples/xxx.css.112: (expr (term (dimension (Dimension "1px")) (ws (Space " "))) (term (ident (Ident "dotted")) (ws))) examples/xxx.css.112: (expr (term (dimension (Dimension "1px")) (ws)) (operator_ (Space " ") (ws)) (term (ident (Ident "dotted")) (ws)))
(dotnet trparse -- --ambig examples/xxx.css | dotnet trxgrep -- ' //expr[.//Dimension[text()="1px"]]' | dotnet trtree -- -a)
A space should not be an operator!
Ken