antlr4 icon indicating copy to clipboard operation
antlr4 copied to clipboard

[antlr4-python3-runtime-4.13.1] ANTLR runtime and generated code versions disagree: 4.13.1!=4.11.2-SNAPSHOT

Open Thomasb81 opened this issue 1 year ago • 15 comments
trafficstars

Hello

This a bug report for https://pypi.org/project/antlr4-python3-runtime/ Apparently something went wrong during the release process of 4.13.1, the generated lexer of the xpath feature has not been regenerated.

This lead to systematic runtime error, when we try to use any antlr4-python3-runtime different than 4.11.2-SNAPSHOT ie:

ANTLR runtime and generated code versions disagree: 4.13.1!=4.11.2-SNAPSHOT

https://github.com/antlr/antlr4/blob/1855d9aa4c5ebb15284b95a07048b6260304629c/runtime/Python3/src/antlr4/xpath/XPathLexer.py#L1 https://github.com/antlr/antlr4/blob/b3bb7439546d2035203790d5513c505720cf0bdd/runtime/Python3/src/antlr4/xpath/XPathLexer.py#L66

Actually the issue still exist in dev branch,: https://github.com/antlr/antlr4/blob/dev/runtime/Python3/src/antlr4/xpath/XPathLexer.py

Probably the release procedure describe here https://github.com/antlr/antlr4/blob/master/doc/releasing-antlr.md miss this update ? @parrt any thought ?

Thomasb81 avatar Jul 06 '24 19:07 Thomasb81

My mistake, apparently it is only an informative message. But running tests coming with runtime show them.

Thomasb81 avatar Jul 06 '24 19:07 Thomasb81

Actually, it's worse than this one issue. Not only are the generated files inconsistent in timestamps/versions, the XPathLexer.g4's are slightly different, and not just in target-specific code, or symbol renaming because target-agnostic format is not followed. I see a token range difference in the .g4 grammar.

$ find . -name XPathLexer.\*
./Cpp/runtime/src/tree/xpath/XPathLexer.cpp
./Cpp/runtime/src/tree/xpath/XPathLexer.g4
./Cpp/runtime/src/tree/xpath/XPathLexer.h
./Cpp/runtime/src/tree/xpath/XPathLexer.tokens
./CSharp/src/Tree/Xpath/XPathLexer.cs
./CSharp/src/Tree/Xpath/XPathLexer.g4
./CSharp/src/Tree/Xpath/XPathLexer.tokens
./Java/src/org/antlr/v4/runtime/tree/xpath/XPathLexer.class
./Java/src/org/antlr/v4/runtime/tree/xpath/XPathLexer.java
./Java/target/classes/org/antlr/v4/runtime/tree/xpath/XPathLexer.class
./Python3/src/antlr4/xpath/XPathLexer.g4
./Python3/src/antlr4/xpath/XPathLexer.py
07/11-06:24:04 /c/Users/Kenne/Documents/GitHub/antlr4/runtime
$ e ./Java/src/org/antlr/v4/runtime/tree/xpath/XPathLexer.java
07/11-06:24:41 /c/Users/Kenne/Documents/GitHub/antlr4/runtime
$ diff ./Cpp/runtime/src/tree/xpath/XPathLexer.g4 ./CSharp/src/Tree/Xpath/XPathLexer.g4
3c3
< tokens { TOKEN_REF, RULE_REF }
---
> tokens { TokenRef, RuleRef }
15,17c15,17
< word: TOKEN_REF
<       |       RULE_REF
<       |       STRING
---
> word: TokenRef
>       |       RuleRef
>       |       String
22,25c22,25
< ANYWHERE : '//' ;
< ROOT   : '/' ;
< WILDCARD : '*' ;
< BANG   : '!' ;
---
> Anywhere : '//' ;
> Root   : '/' ;
> Wildcard : '*' ;
> Bang   : '!' ;
29,32c29,33
<                               if (isupper(getText()[0]))
<                                 setType(TOKEN_REF);
<                               else
<                                 setType(RULE_REF);
---
>                               String text = Text;
>                               if ( Char.IsUpper(text[0]) )
>                                       Type = TokenRef;
>                               else
>                                       Type = RuleRef;
58,59c59,60
<             |   '\uFDF0'..'\uFFFF' // implicitly includes ['\u10000-'\uEFFFF]
<             ;
---
>             |   '\uFDF0'..'\uFFFD'
>             ; // ignores | ['\u10000-'\uEFFFF] ;
61c62
< STRING : '\'' .*? '\'';
---
> String : '\'' .*? '\'' ;
63c64
< //WS : [ \t\r\n]+ -> skip ;
---
> //Ws : [ \t\r\n]+ -> skip ;
07/11-06:25:10 /c/Users/Kenne/Documents/GitHub/antlr4/runtime
$ grep -i -e generated `find . -name 'XPathLexer.*'`
./Cpp/runtime/src/tree/xpath/XPathLexer.cpp:// Generated from XPathLexer.g4 by ANTLR 4.13.0
./Cpp/runtime/src/tree/xpath/XPathLexer.h:// Generated from XPathLexer.g4 by ANTLR 4.13.0
./CSharp/src/Tree/Xpath/XPathLexer.cs:// <auto-generated>
./CSharp/src/Tree/Xpath/XPathLexer.cs://     This code was generated by a tool.
./CSharp/src/Tree/Xpath/XPathLexer.cs://     the code is regenerated.
./CSharp/src/Tree/Xpath/XPathLexer.cs:// </auto-generated>
./CSharp/src/Tree/Xpath/XPathLexer.cs:// Generated from XPathLexer.g4 by ANTLR 4.11.2-SNAPSHOT
./CSharp/src/Tree/Xpath/XPathLexer.cs:[System.CodeDom.Compiler.GeneratedCode("ANTLR", "4.11.2-SNAPSHOT")]
./Python3/src/antlr4/xpath/XPathLexer.py:# Generated from XPathLexer.g4 by ANTLR 4.11.2-SNAPSHOT
07/11-06:27:00 /c/Users/Kenne/Documents/GitHub/antlr4/runtime
$

Note: For one of the current ongoing rewrites of Antlr, I'd highly recommend a complete rewrite of the Antlr tree representation in order to support tree edits and querying of off-channel content, along with replacing the current XPath engine in the with a real one, preferably Selenium, which is the gold standard. Trash uses a port of the ancient Xerces engine, which is excellent, but only supports XPath version 2. I'm still porting Selenium to C#.

kaby76 avatar Jul 11 '24 10:07 kaby76

A difference in case. Not sure why there is such a difference, but it I can't see how this is an issue. I've never had to write a target agnostic frontend/compiler in the last 3 decades... er, ever.

Also, ANTLR produces parse trees. They are only useful for facilitating production of a more formal AST. That's where we do the micro optimizations. If writing a real world compiler, generate LLVM and leave it there.

This is not an official position in any way. Just my thoughts. Python for parsing? Hmmm

jimidle avatar Jul 11 '24 15:07 jimidle

A difference in case.

'\uFDF0'..'\uFFFF' is not the same range as '\uFDF0'..'\uFFFD'. It is not a matter of a difference in case.

kaby76 avatar Jul 11 '24 15:07 kaby76

I think you know what I mean. Or perhaps not.

On Thu, Jul 11, 2024 at 09:34 Ken Domino @.***> wrote:

A difference in case.

'\uFDF0'..'\uFFFF' is not the same range as '\uFDF0'..'\uFFFD'. It is not a matter of a difference in case.

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4655#issuecomment-2223260633, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMAIGNOFGKWOAHDFSBTZL2QX5AVCNFSM6AAAAABKOVPXACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTGI3DANRTGM . You are receiving this because you commented.Message ID: @.***>

jimidle avatar Jul 11 '24 16:07 jimidle

Any updates on this ? Or is there an option to hide the test, in the source, it seems that it cannot be skipped ?

jules-l-jimmy avatar Jul 19 '24 09:07 jules-l-jimmy

we don't always regen those but I'm gonna do a release soon and will try to regen those.

parrt avatar Jul 19 '24 17:07 parrt

Should we propose PR to remove this print/check from python runtime ? It seems not there in other runtime.

Thomasb81 avatar Jul 19 '24 18:07 Thomasb81

@Thomasb81 no, the check is in every runtime (and should stay there, we don't want to deal with very complex errors caused by an undetected change in serialization format)

ericvergnaud avatar Jul 25 '24 19:07 ericvergnaud

@ericvergnaud Sorry I do not understand.

I can't see equivalent code to https://github.com/antlr/antlr4/blob/b3bb7439546d2035203790d5513c505720cf0bdd/runtime/Python3/src/antlr4/xpath/XPathLexer.py#L66

implemented here by a simple print: https://github.com/antlr/antlr4/blob/17f3f5e590084200142e69e1c83695b7d956b778/runtime/Python3/src/antlr4/Recognizer.py#L36

in other implementation java : https://github.com/antlr/antlr4/blob/dev/runtime/Java/src/org/antlr/v4/runtime/tree/xpath/XPathLexer.java. or Cpp implementation : https://github.com/antlr/antlr4/blob/dev/runtime/Cpp/runtime/src/tree/xpath/XPathLexer.cpp

I understand the utility of such check to ensure some compatibility and avoid a first level of support for user that would use tool in version A and runtime in version B. But here we are in a very special case where the generated code is used to implement the runtime it self.

I can't avoid this print when I use xpath feature with python3 -runtime download from pypi, which is very annoying.

What is the right fix to propose : regenerate xpath lexer grammar or align runtime feature or both ?

Thomasb81 avatar Jul 25 '24 21:07 Thomasb81

@ericvergnaud But the check is still with 4.11.2-SNAPSHOT whereas multiple commits switched to 4.13 https://github.com/antlr/antlr4/commit/134eda9f1a815bc2b9b863773a17fe08951cae86 https://github.com/antlr/antlr4/commit/c630cb6bce7af9c71f4d10320da9fa8392ec1855 https://github.com/antlr/antlr4/commit/e9df4649845e55393306603a4dd42cecc5af6bfc The check should be also bumped to 4.13.1 ?

jules-l-jimmy avatar Jul 26 '24 06:07 jules-l-jimmy

The Java version is handwritten hence the difference. The Cpp version has equivalent code here: https://github.com/antlr/antlr4/blob/88a0c7ab03137dfd6b24c2f235652a46d0edfa39/runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp#L260. The right fix is to regenerate the path lexer, which is already on its way.

ericvergnaud avatar Jul 26 '24 07:07 ericvergnaud

An alternative approach would be to handwrite the XPathLexer for Python, I'll happily accept the corresponding PR.

ericvergnaud avatar Jul 26 '24 07:07 ericvergnaud

Trying to get antlr 4.13.2 out. Will try to regen all XPath lexers.

parrt avatar Aug 03 '24 17:08 parrt

I see that 4.13.2 has been publish but I can still read in the python runtime (xpath/XPathLexer.py)

self.checkVersion("4.13.1")

and (Recognizer.py)

    def checkVersion(self, toolVersion):
        runtimeVersion = "4.13.2"
        rvmajor, rvminor = self.extractVersion(runtimeVersion)
        tvmajor, tvminor = self.extractVersion(toolVersion)
        if rvmajor!=tvmajor or rvminor!=tvminor:
            print("ANTLR runtime and generated code versions disagree: "+runtimeVersion+"!="+toolVersion)

Seems to me that the simple regeneration of the XPath grammar is not enough or there is a chicken-egg problem...

Thomasb81 avatar Nov 11 '24 23:11 Thomasb81