antlr4 Yes...Terrible Golang performance for 4.11.1

Yes...Terrible Golang performance for 4.11.1

Open kaby76 opened this issue 1 year ago • 15 comments

I am updating grammars-v4 for 4.11.1. Unfortunately, the asm8080 and asmZ80 grammars now do not terminate (actually killed after 5 minutes by the watchdog program trwdog). For 4.10.1, they terminated. Neither of these grammars contain target-specific code, and are really really simple.

Sep 08 '22 10:09 kaby76

Just in case I'm doing something wrong, I tested 4.11.1 on the SimpleBoolean issue that was reported. Indeed, 4.11.1 does fix that problem (fast!), but asm8080 now takes 376s for cpm22.asm.

Sep 09 '22 00:09 kaby76

I did a side-by-side comparison of the debug output from ParserATNSimulator between CSharp and Go targets for asm8080/cpm22.asm. The outputs are essentially identical which means they are computing the same results. But, clearly, the Go implementation is much much slower.

Sep 09 '22 00:09 kaby76

I will take a look. There is still an issue with way too many allocations in parse_context. I know what it is (failure to 'emulate' Java objects), but I felt it was too much to try and cram in as a fix at the last minute- programmer checks in code and then gets on a plane type situation ;)

There was one fix that made it to what is the non-v4 version that did not get in to the v4 version (thought they were supposed to be identical). I have submitted a PR for that and is awaiting Ter's approval. But I doubt it is that because that fixes something that just did not work before.

Tomorrow (public holiday here), I will take a look at those grammars and work out what is going on - I suspect it is the allocation problem, but could be a hidden bug that I did not find yet.

I am surprised that they do not terminate (or did they just need more time to run?).

Thanks for the heads up. This code is taking some time to wrangle in to shape.

Sep 09 '22 09:09 jimidle

BTW - I get this when generating, which is a correct warning:

warning(154): asm8080.g4:38:0: rule prog contains an optional block with at least one alternative that can match an empty string

ANTLR4 will deal with this I think, but that could be refactored to eliminate that warning. This may have something to do with it of course. I will not correct it until I understand why the go runtime seems to hate this grammar ;)

Sep 09 '22 09:09 jimidle

@jimidle Sorry, the asm8080/cpm22.asm parse does terminate. It takes about 380 seconds.

The asm8080 grammar "prog" rule should be changed to something like this:

prog : ( EOL* | (line EOL)* line EOL*) EOF ;

(A "prog" can be an empty file, or a file with a bunch of EOLs, or a file with a bunch of "line" that can end with EOL or just EOF. This rule is a royal PITA. The Z80 grammar probably has a similar problem.)

But, even with the change, it still takes 380 seconds.

I noticed that my driver still contained the old "case folding buffer stream" code for Go with a reference to "github.com/antlr/antlr4/runtime/Go/antlr", not "github.com/antlr/antlr4/runtime/Go/antlr/v4"--didn't do a find/sed. So, it had both "github.com/antlr/antlr4/runtime/Go/antlr" and "github.com/antlr/antlr4/runtime/Go/antlr/v4" in the last build on grammars-v4. But, I changed that locally, and the parse still takes 380s.

I'll look at the grammar itself and see if there's a workaround.

Sep 09 '22 11:09 kaby76

@jimidle My God this is such a bad grammar. It's almost wonderful in its "badness".

comment can't derive a default channel string.

line : lbl? (instruction | directive)? comment?;
comment : COMMENT;
COMMENT : ';' ~ [\r\n]* -> skip;

Antlr should flag this. @parrt

Sep 09 '22 12:09 kaby76

Fixed these rules, now works fast.

prog  : EOL* ((line EOL)* line EOL*)? EOF ;
line : lbl? (instruction | directive) comment? | lbl comment? | comment ;
COMMENT : ';' ~ [\r\n]* ;

Sep 09 '22 12:09 kaby76

I did suspect the grammar, but one of the central tenets of ANTLR4 was that it should more or less deal with anything. So I will use the original grammar to find out what is balking the go runtime. Whatever it is, will likely be a benefit to all grammars when I fix it.

On Fri, Sep 9, 2022 at 20:49 Ken Domino @.***> wrote:

Fixed these rules, now works fast.

prog : EOL* ((line EOL)* line EOL*)? EOF ; line : lbl? (instruction | directive) comment? | lbl comment? | comment ; COMMENT : ';' ~ [\r\n]* ;

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3875#issuecomment-1241937451, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMFT73I32T6JQBQOEW3V5MW6RANCNFSM6AAAAAAQHTMF7U . You are receiving this because you were mentioned.Message ID: @.***>

Sep 09 '22 13:09 jimidle

@jimidle Thanks sounds good. I'll work on fixing up the grammars over in grammars-v4/asm.

The should also add to the "to do" a check that Antlr tool does to flag a grammar if a parser rule uses a "skip" or off-channel symbol.

Sep 09 '22 14:09 kaby76

@kaby76 related issue for skip tokens: https://github.com/antlr/antlr4/issues/2886 (but with using of FRAGMENT instead of skip tokens). Unfortunately, unlikely it will be fixed because my fixing request was closed: https://github.com/antlr/antlr4/pull/3018

Sep 09 '22 18:09 KvanTTT

As the grammar was written, it causes an full context parse for every line in the file, and this exposes some problems in the go runtime there. While this grammar needs quite a bit of work to be "real", it has proven useful as an indicator of problems. Sometimes (no offence meant to the contributor) poorly put together grammars can be useful. We might want to keep that grammar as it is somewhere outside where it will currently be corrected, in order that it can be used as a regression test. If there is no general place to do this, then I will add it as regression test data in the go directory (Go will not download that when the module is imported - well, at least it won't when we get the tags correct and stop adding v4.x.x tags).

Sep 10 '22 04:09 jimidle

For what it is worth, the "corrected" grammar, which I would write like this below takes 0.31 seconds on my system vs 0.37 for the unchanged original grammar. Actually I would probably take out the embedded literals in the parser grammar as well.

grammar asm8080;

prog
   : line* EOF
   ;

line
   : lbl? (instruction | directive)? EOL
   ;

instruction
   : opcode expressionlist?
   ;

opcode
   : OPCODE
   ;

register_
   : REGISTER
   ;

directive
   : argument? assemblerdirective expressionlist
   ;

assemblerdirective
   : ASSEMBLER_DIRECTIVE
   ;

lbl
   : label ':'?
   ;

expressionlist
   : expression (',' expression)*
   ;

label
   : name
   ;

expression
   : multiplyingExpression (('+' | '-') multiplyingExpression)*
   ;

multiplyingExpression
   : argument (('*' | '/') argument)*
   ;

argument
   : number
   | register_
   | dollar
   | name
   | string_
   | ('(' expression ')')
   ;

dollar
   : '$'
   ;

string_
   : STRING
   ;

name
   : NAME
   ;

number
   : NUMBER
   ;

ASSEMBLER_DIRECTIVE
   : (O R G) | (E N D) | (E Q U) | (D B) | (D W) | (D S) | (I F) | (E N D I F) | (S E T)
   ;


REGISTER
   : 'A' | 'B' | 'C' | 'D' | 'E' | 'H' | 'L' | 'PC' | 'SP'
   ;


OPCODE
   : (M O V) | (M V I) | (L D A) | (S T A) | (L D A X) | (S T A X) | (L H L D) | (S H L D) | (L X I) | (P U S H) | (P O P) | (X T H L) | (S P H L) | (P C H L) | (X C H G) | (A D D) | (S U B) | (I N R) | (D C R) | (C M P) | (A N A) | (O R A) | (X R A) | (A D I) | (S U I) | (C P I) | (A N I) | (O R I) | (X R I) | (D A A) | (A D C) | (A C I) | (S B B) | (S B I) | (D A D) | (I N X) | (D C X) | (J M P) | (C A L L) | (R E T) | (R A L) | (R A R) | (R L C) | (R R C) | (I N) | (O U T) | (C M C) | (S T C) | (C M A) | (H L T) | (N O P) | (D I) | (E I) | (R S T) | (J N Z) | (J Z) | (J N C) | (J C) | (J P O) | (J P E) | (J P) | (J M) | (C N Z) | (C Z) | (C N C) | (C C) | (C P O) | (C P E) | (C P) | (C M) | (R N Z) | (R Z) | (R N C) | (R C) | (R P O) | (R P E) | (R P) | (R M)
   ;


fragment A
   : ('a' | 'A')
   ;


fragment B
   : ('b' | 'B')
   ;


fragment C
   : ('c' | 'C')
   ;


fragment D
   : ('d' | 'D')
   ;


fragment E
   : ('e' | 'E')
   ;


fragment F
   : ('f' | 'F')
   ;


fragment G
   : ('g' | 'G')
   ;


fragment H
   : ('h' | 'H')
   ;


fragment I
   : ('i' | 'I')
   ;


fragment J
   : ('j' | 'J')
   ;


fragment K
   : ('k' | 'K')
   ;


fragment L
   : ('l' | 'L')
   ;


fragment M
   : ('m' | 'M')
   ;


fragment N
   : ('n' | 'N')
   ;


fragment O
   : ('o' | 'O')
   ;


fragment P
   : ('p' | 'P')
   ;


fragment Q
   : ('q' | 'Q')
   ;


fragment R
   : ('r' | 'R')
   ;


fragment S
   : ('s' | 'S')
   ;


fragment T
   : ('t' | 'T')
   ;


fragment U
   : ('u' | 'U')
   ;


fragment V
   : ('v' | 'V')
   ;


fragment W
   : ('w' | 'W')
   ;


fragment X
   : ('x' | 'X')
   ;


fragment Y
   : ('y' | 'Y')
   ;


fragment Z
   : ('z' | 'Z')
   ;


NAME
   : [a-zA-Z] [a-zA-Z0-9."]*
   ;


NUMBER
   : '$'? [0-9a-fA-F] + ('H' | 'h')?
   ;


COMMENT
   : ';' ~ [\r\n]* -> skip
   ;


STRING
   : '\u0027' ~'\u0027'* '\u0027'
   ;


EOL
   : [\r\n]
   ;


WS
   : [ \t] -> skip
   ;

Sep 10 '22 08:09 jimidle

Excellent. I'll add your note to the "to do list".

Sep 10 '22 11:09 kaby76

I think it would be good to have a test that involved for context parsing all the time, @jimidle. Probably good for the other targets to check this as well.

Sep 15 '22 23:09 parrt

Agreed. I will put something together in the next few days.

On Fri, Sep 16, 2022 at 07:56 Terence Parr @.***> wrote:

I think it would be good to have a test that involved for context parsing all the time, @jimidle https://github.com/jimidle. Probably good for the other targets to check this as well.

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3875#issuecomment-1248765592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMETPGB6PVG22Q2YMSDV6OZTJANCNFSM6AAAAAAQHTMF7U . You are receiving this because you were mentioned.Message ID: @.***>

Sep 16 '22 04:09 jimidle

antlr4 antlr4 copied to clipboard

Yes...Terrible Golang performance for 4.11.1

antlr4
antlr4 copied to clipboard