grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

Java20 Grammar - handling of Java contextual keywords as method names

Open amaidment opened this issue 8 months ago • 4 comments

I'm using the ANTLR4 Java20 grammar, and finding some unexpected behaviour when parsing Java code that uses contextual keywords in a valid contexts, such as in method names - as, indeed, is done in some core Java classes.

Just to provide a clear example of what I'm doing, to get the output from the parser. (I expect that there's an easier way to do this, but this is what I'm using):

CharStream javaSource = ...;
Java20Lever lexer = new Java20Lever(javaSource);
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
Java20Parser parser = new Java20Parser(tokens);
Java20Parser.CompilationUnitContext compilation = parser.compilationUnit();

ParseTreeWalker walker = new ParseTreeWalker();
Java20ParserBaseListener listener = new Java20ParserBaseListener() {
  private final AtomicInteger _indent = new AtomicInteger(0);

  @Override
  public void enterEveryRule(ParserRuleContext ctx) {
    int indent = _indent.getAndAdd(2);
    System.out.println(" ".repeat(indent)+ctx.getRuleIndex()+" ("+ctx.getClass().getSimpleName()+"): "+ctx.getPayload().getText());
  } 

  @Override
  public void exitEveryRule(ParserRuleContext ctx) {
    int indent = _indent.addAndGet(-2);
    System.out.println(" ".repeat(indent)+ctx.getRuleIndex()+" ("+ctx.getClass().getSimpleName()+")");
  }
};
walker.walk(listener, compilation);

First of all, let's start with something that does work: Example 1: If the method has no return type and no parameters, it is parsed correctly:

public class DemoClass {
  public void uses() {
  }
}

In this case, the class body correctly identify the method - just to provide the relevant extract:

55 (ClassBodyContext): {publicvoiduses(){}}
  56 (ClassBodyDeclarationContext): publicvoiduses(){}
    57 (ClassMemberDeclarationContext): publicvoiduses(){}
      73 (MethodDeclarationContext): publicvoiduses(){}
        74 (MethodModifierContext): public
        74 (MethodModifierContext)
        75 (MethodHeaderContext): voiduses()
          76 (ResultContext): void
          76 (ResultContext)
          77 (MethodDeclaratorContext): uses()
          77 (MethodDeclaratorContext)
        75 (MethodHeaderContext)
        86 (MethodBodyContext): {}
          138 (BlockContext): {}
          138 (BlockContext)
        86 (MethodBodyContext)
      73 (MethodDeclarationContext)
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
55 (ClassBodyContext)

Example 2: If the method has no return type (i.e. void), it is parsed as a method, but does not parse the parameters correctly:

public class DemoClass {
  public void uses(double other) {
  }
}

This correctly identifies that there is a method called "uses", but does not handle the rest of the method correctly - just to provide the relevant extract:

55 (ClassBodyContext): {publicvoiduses(){}}
  56 (ClassBodyDeclarationContext): publicvoiduses(
    57 (ClassMemberDeclarationContext): publicvoiduses(
      73 (MethodDeclarationContext): publicvoiduses(
        74 (MethodModifierContext): public
        74 (MethodModifierContext)
        75 (MethodHeaderContext): voiduses()
          76 (ResultContext): void
          76 (ResultContext)
          77 (MethodDeclaratorContext): uses(
          77 (MethodDeclaratorContext)
        75 (MethodHeaderContext)
        86 (MethodBodyContext): {}
        86 (MethodBodyContext)
      73 (MethodDeclarationContext)
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): 
    57 (ClassMemberDeclarationContext):
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): double
    57 (ClassMemberDeclarationContext): double
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext):
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): other)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): {}
    87 (InstanceInitializerContext): {}
      138 (BlockContent): {}
      138 (BlockContent)
    87 (InstanceInitializerContext)
  56 (ClassBodyDeclarationContext)
55 (ClassBodyContext)

Example 3: A java class which uses the contextual keyword "uses" as a method name fails to parse the method correctly:

public class DemoClass {
  public int uses(int other) {
    return other * 2;
  }
}

This never even identifies that there is a method:

55 (ClassBodyContext): {publicintuses(intother){returnother*2;}}
  56 (ClassBodyDeclarationContext): 
    57 (ClassMemberDeclarationContext):
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): public
    57 (ClassMemberDeclarationContext): public
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): intuses(
    57 (ClassMemberDeclarationContext): intuses(
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): 
    57 (ClassMemberDeclarationContext):
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): int
    57 (ClassMemberDeclarationContext): int
    57 (ClassMemberDeclarationContext)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): 
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): other)
  56 (ClassBodyDeclarationContext)
  56 (ClassBodyDeclarationContext): {returnother*2;}
    87 (InstanceInitializerContext): {returnother*2;}
      138 (BlockContent): {returnother*2}
        ... // lots of elements
      138 (BlockContent)
    87 (InstanceInitializerContext)
  56 (ClassBodyDeclarationContext)
55 (ClassBodyContext)

I have seen a couple of previous issues that highlight similar issues with contextual keywords and field names, but nothing relating to method names:

  1. https://github.com/antlr/grammars-v4/issues/3246
  2. https://github.com/antlr/grammars-v4/issues/3549

It's not clear to me whether the comments (here and here) are saying that this is just a limitation of ANTLR4 / the Java grammar to handle contextual keywords wherever they may appear in Java code.

It's also worth noting that this means that core Java classes that inherit from Temporal - like LocalDate and LocalTime - cannot be parsed correctly, because they cannot properly handle Temporal.with(TemporalAdjuster).

amaidment avatar Oct 20 '23 14:10 amaidment

Note: I should also add that these are not issues that I see with the main (if that's the right term?) Java grammar, which is described as supporting Java 7, 8, 11 and 17.

I have also come across this discussion, which makes the following points:

amaidment avatar Oct 20 '23 15:10 amaidment

The java20 grammar has issues. It doesn't parse the JDK as well as the java/ grammar. Best to use the java/ grammar even though it isn't up to date.

kaby76 avatar Oct 20 '23 17:10 kaby76

The java20 grammar has issues. It doesn't parse the JDK as well as the java/ grammar. Best to use the java/ grammar even though it isn't up to date.

Might it be possible to update the README.md for both the grammars:

  • for java/java20 - that it is a work in progress, and that most users would be better served with the java/java grammar
  • for java/java - clarify which version of Java the grammar works - is it expected to be compatible with Java 20 and/or Java 21?

As it stands, at face value it appears (or did appear to me) that the java/java20 is the more complete & up-to-date version, which it may become but is a status that it doesn’t appear to have attained yet.

amaidment avatar Oct 20 '23 17:10 amaidment

  • for java/java

This one doesn't even parse method modifiers.

yamin8000 avatar Dec 04 '23 13:12 yamin8000