grammars-v4
grammars-v4 copied to clipboard
Change individual to grouped parsing
Testing in the V4 repo parse one input file per test program, which I call individual parsing. But, we can save a good deal of time by parsing multiple input files per test program, called grouped parsing. Let's see what this looks like in the output from a build:
Testing java/java9
~/issues/issue-2988/before/java/java9/Generated ~/issues/issue-2988/before
bash test.sh
../examples/AllInOne8.java
Time: 00:00:05.9106526
Parse succeeded.
../examples/helloworld.java
Time: 00:00:00.5270543
Parse succeeded.
../examples/IdentifierTest.java
Time: 00:00:00.6478677
Parse succeeded.
../examples/Instanceof.java
Time: 00:00:00.2960225
Parse succeeded.
../examples/ManyStringsConcat.java
Time: 00:00:00.1571820
Parse succeeded.
../examples/module-info.java
Time: 00:00:00.0614978
Parse succeeded.
../examples/TryWithResourceDemo.java
Time: 00:00:00.7459264
Parse succeeded.
../examples/Unicode.java
Time: 00:00:00.5391444
Parse succeeded.
Duration: 0 hours 0 minutes 12 seconds
With grouped parsing, the run time for inputs after the first are much shorter:
Testing java/java9
~/issues/issue-2988/grammars-v4/java/java9/Generated-CSharp ~/issues/issue-2988/grammars-v4
bash test.sh
CSharp 0 ../examples/AllInOne8.java success 6.1742676
CSharp 1 ../examples/helloworld.java success 0.0255556
CSharp 2 ../examples/IdentifierTest.java success 0.1810686
CSharp 3 ../examples/Instanceof.java success 0.0866523
CSharp 4 ../examples/ManyStringsConcat.java success 0.0374591
CSharp 5 ../examples/module-info.java success 0.0031916
CSharp 6 ../examples/TryWithResourceDemo.java success 0.1530053
CSharp 7 ../examples/Unicode.java success 0.044574
Total Time: 6.987732
Grouped parsing will help for Java, CSharp, and probably a few other targets. But, it may not fix targets like PHP, which show no speed-up with warm-up. See https://github.com/antlr/antlr-php-runtime/issues/36.
To implement grouped parsing, the newest version of Trash trgen will need to be employed, as well as the templates in this repo updated.
There is still a significant amount of time required to generate and compile the drivers, so the builds will still be slow.
There is one problem with warm-up parsing: the script is designed to produce one .error and one .tree file when needed. For example, in antlr/antlr4/examples, grammar three.g4 does not parse, and should produce the file three.g4.errors. Likewise, in arithmetic/examples, there are several .tree files containing the parse trees for those files.
I think the solution is to have an option to shunt output to a .error and/or .tree file when parsing, instead of sending everything to stderr/stdout.