grammars-v4
grammars-v4 copied to clipboard
Java20 Grammar - handling of Java contextual keywords as method names
I'm using the ANTLR4 Java20 grammar, and finding some unexpected behaviour when parsing Java code that uses contextual keywords in a valid contexts, such as in method names - as, indeed, is done in some core Java classes.
Just to provide a clear example of what I'm doing, to get the output from the parser. (I expect that there's an easier way to do this, but this is what I'm using):
CharStream javaSource = ...;
Java20Lever lexer = new Java20Lever(javaSource);
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
Java20Parser parser = new Java20Parser(tokens);
Java20Parser.CompilationUnitContext compilation = parser.compilationUnit();
ParseTreeWalker walker = new ParseTreeWalker();
Java20ParserBaseListener listener = new Java20ParserBaseListener() {
private final AtomicInteger _indent = new AtomicInteger(0);
@Override
public void enterEveryRule(ParserRuleContext ctx) {
int indent = _indent.getAndAdd(2);
System.out.println(" ".repeat(indent)+ctx.getRuleIndex()+" ("+ctx.getClass().getSimpleName()+"): "+ctx.getPayload().getText());
}
@Override
public void exitEveryRule(ParserRuleContext ctx) {
int indent = _indent.addAndGet(-2);
System.out.println(" ".repeat(indent)+ctx.getRuleIndex()+" ("+ctx.getClass().getSimpleName()+")");
}
};
walker.walk(listener, compilation);
First of all, let's start with something that does work: Example 1: If the method has no return type and no parameters, it is parsed correctly:
public class DemoClass {
public void uses() {
}
}
In this case, the class body correctly identify the method - just to provide the relevant extract:
55 (ClassBodyContext): {publicvoiduses(){}}
56 (ClassBodyDeclarationContext): publicvoiduses(){}
57 (ClassMemberDeclarationContext): publicvoiduses(){}
73 (MethodDeclarationContext): publicvoiduses(){}
74 (MethodModifierContext): public
74 (MethodModifierContext)
75 (MethodHeaderContext): voiduses()
76 (ResultContext): void
76 (ResultContext)
77 (MethodDeclaratorContext): uses()
77 (MethodDeclaratorContext)
75 (MethodHeaderContext)
86 (MethodBodyContext): {}
138 (BlockContext): {}
138 (BlockContext)
86 (MethodBodyContext)
73 (MethodDeclarationContext)
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
55 (ClassBodyContext)
Example 2: If the method has no return type (i.e. void), it is parsed as a method, but does not parse the parameters correctly:
public class DemoClass {
public void uses(double other) {
}
}
This correctly identifies that there is a method called "uses", but does not handle the rest of the method correctly - just to provide the relevant extract:
55 (ClassBodyContext): {publicvoiduses(){}}
56 (ClassBodyDeclarationContext): publicvoiduses(
57 (ClassMemberDeclarationContext): publicvoiduses(
73 (MethodDeclarationContext): publicvoiduses(
74 (MethodModifierContext): public
74 (MethodModifierContext)
75 (MethodHeaderContext): voiduses()
76 (ResultContext): void
76 (ResultContext)
77 (MethodDeclaratorContext): uses(
77 (MethodDeclaratorContext)
75 (MethodHeaderContext)
86 (MethodBodyContext): {}
86 (MethodBodyContext)
73 (MethodDeclarationContext)
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext):
57 (ClassMemberDeclarationContext):
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext): double
57 (ClassMemberDeclarationContext): double
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext):
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext): other)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext): {}
87 (InstanceInitializerContext): {}
138 (BlockContent): {}
138 (BlockContent)
87 (InstanceInitializerContext)
56 (ClassBodyDeclarationContext)
55 (ClassBodyContext)
Example 3: A java class which uses the contextual keyword "uses" as a method name fails to parse the method correctly:
public class DemoClass {
public int uses(int other) {
return other * 2;
}
}
This never even identifies that there is a method:
55 (ClassBodyContext): {publicintuses(intother){returnother*2;}}
56 (ClassBodyDeclarationContext):
57 (ClassMemberDeclarationContext):
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext): public
57 (ClassMemberDeclarationContext): public
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext): intuses(
57 (ClassMemberDeclarationContext): intuses(
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext):
57 (ClassMemberDeclarationContext):
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext): int
57 (ClassMemberDeclarationContext): int
57 (ClassMemberDeclarationContext)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext):
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext): other)
56 (ClassBodyDeclarationContext)
56 (ClassBodyDeclarationContext): {returnother*2;}
87 (InstanceInitializerContext): {returnother*2;}
138 (BlockContent): {returnother*2}
... // lots of elements
138 (BlockContent)
87 (InstanceInitializerContext)
56 (ClassBodyDeclarationContext)
55 (ClassBodyContext)
I have seen a couple of previous issues that highlight similar issues with contextual keywords and field names, but nothing relating to method names:
- https://github.com/antlr/grammars-v4/issues/3246
- https://github.com/antlr/grammars-v4/issues/3549
It's not clear to me whether the comments (here and here) are saying that this is just a limitation of ANTLR4 / the Java grammar to handle contextual keywords wherever they may appear in Java code.
It's also worth noting that this means that core Java classes that inherit from Temporal
- like LocalDate
and LocalTime
- cannot be parsed correctly, because they cannot properly handle Temporal.with(TemporalAdjuster)
.
Note: I should also add that these are not issues that I see with the main (if that's the right term?) Java grammar, which is described as supporting Java 7, 8, 11 and 17.
I have also come across this discussion, which makes the following points:
- the java/java grammar is a lot faster, if less clearly linked to the the JLS.
- the java/java grammar does support context keywords
The java20 grammar has issues. It doesn't parse the JDK as well as the java/ grammar. Best to use the java/ grammar even though it isn't up to date.
The java20 grammar has issues. It doesn't parse the JDK as well as the java/ grammar. Best to use the java/ grammar even though it isn't up to date.
Might it be possible to update the README.md for both the grammars:
- for java/java20 - that it is a work in progress, and that most users would be better served with the java/java grammar
- for java/java - clarify which version of Java the grammar works - is it expected to be compatible with Java 20 and/or Java 21?
As it stands, at face value it appears (or did appear to me) that the java/java20 is the more complete & up-to-date version, which it may become but is a status that it doesn’t appear to have attained yet.
- for java/java
This one doesn't even parse method modifiers.