grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

csharp parse method_invocation error

Open ningpp opened this issue 2 years ago • 5 comments

Summary: method_invocation parsed as member_access

//result of expression obj.say("hello!");
//method name "say" parsed as member_access "say"
expressionStatement (
    primary_expression obj (member_access . say) (
        method_invocation ( "hello" )
    )
);

//result of expression obj.callSomeone().say("hello!")
expressionStatement (
    primary_expression obj (member_access . callSomeone) (
        method_invocation ()
    ) (member_access . say) (method_invocation ( "hello" ))
);

code:

public static void main(String[] args) {
        String source = """
                namespace Ningpp {
                    public class Demo {
                        void methodInvocationDemo(Person obj) {
                            obj.say("hello!");
                            obj.callSomeone().say("hello!");
                        }
                    }
                }
                """;

        CSharpLexer lexer = new CSharpLexer(CharStreams.fromString(source));
        CSharpParser parser = new CSharpParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.compilation_unit();

        var node = new TreeRewriter(tree).rewrite();
        System.out.println(node);
}

ningpp avatar Jan 29 '23 14:01 ningpp

obj.say("hello!") is being parsed as a primary_expression, which contains three children: primary_expression_start ('obj'), member_access ('.say'), and method_invocation ('("hello!")'). What is the error with this structure?

( expression
  ( non_assignment_expression
    ( conditional_expression
      ( null_coalescing_expression
        ( conditional_or_expression
          ( conditional_and_expression
            ( inclusive_or_expression
              ( exclusive_or_expression
                ( and_expression
                  ( equality_expression
                    ( relational_expression
                      ( shift_expression
                        ( additive_expression
                          ( multiplicative_expression
                            ( switch_expression
                              ( range_expression
                                ( unary_expression
                                  ( primary_expression
                                    ( primary_expression_start
                                      ( identifier
                                        ( intertoken text:'\r\n\t    ' tt:7 chnl:HIDDEN
                                        ) 
                                        ( IDENTIFIER
                                          (  text:'obj' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
                                    ) ) ) ) 
                                    ( member_access
                                      ( DOT
                                        (  text:'.' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
                                      ) ) 
                                      ( identifier
                                        ( IDENTIFIER
                                          (  text:'say' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
                                    ) ) ) ) 
                                    ( method_invocation
                                      ( OPEN_PARENS
                                        (  text:'(' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
                                      ) ) 
                                      ( argument_list
                                        ( argument
                                          ( expression
                                            ( non_assignment_expression
                                              ( conditional_expression
                                                ( null_coalescing_expression
                                                  ( conditional_or_expression
                                                    ( conditional_and_expression
                                                      ( inclusive_or_expression
                                                        ( exclusive_or_expression
                                                          ( and_expression
                                                            ( equality_expression
                                                              ( relational_expression
                                                                ( shift_expression
                                                                  ( additive_expression
                                                                    ( multiplicative_expression
                                                                      ( switch_expression
                                                                        ( range_expression
                                                                          ( unary_expression
                                                                            ( primary_expression
                                                                              ( primary_expression_start
                                                                                ( literal
                                                                                  ( string_literal
                                                                                    ( REGULAR_STRING
                                                                                      (  text:'\"hello!\"' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
                                      ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) 
                                      ( CLOSE_PARENS
                                        (  text:')' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) 

In the C# Language Spec, this is represented with a invocation_expression. But I think the problem here is that using that rule un-refactored may have performance issues. It looks like it was left factored, but the reasons for the refactoring is usually never recorded since people do them usually by hand.

// Source: §11.7.8.1 General
invocation_expression
    : primary_expression '(' argument_list? ')'
    ;

kaby76 avatar Jan 29 '23 19:01 kaby76

@kaby76 right structure is primary_expression_start ('obj') and method_invocation say ('("hello!")').

C# Language Spec has primary_expression , but our grammar no

https://github.com/antlr/grammars-v4/blob/76baeba5f3536b59da170a6d48bda64cb6887eb7/csharp/CSharpParser.g4#L1213-L1217

ningpp avatar Jan 29 '23 23:01 ningpp

@kaby76 you can also test this expression obj.PropertyAbc.say("hello");

ningpp avatar Jan 29 '23 23:01 ningpp

C# Language Spec has primary_expression ...

The grammar in this repo is not the same as the Spec, but it likely deviates from the Spec for a good reason. This part of the grammar was from the very first commit of the grammar seven years ago. https://github.com/antlr/grammars-v4/commit/e2b4ec77fd843352c86393415842b08b1157f615 . We don't know which version of the Spec was used, but likely the one from 2006; see https://en.wikipedia.org/wiki/C_Sharp_(programming_language)#Versions. And we don't know why rules were refactored.

It's on my "to do" list to rewrite this grammar at some point using automated scraping of the grammar rules from the Specs--each version--and cataloging the transformations employed. Grammars should be automatically scraped from a Spec. If a Spec does not exist, it should be scraped directly from implementation. Refactorings that were done to bring the grammar into an efficient grammar for a parser generator should be automated. Questions such as yours is exactly why there needs to be a detailed paper trail for a grammar.

kaby76 avatar Jan 30 '23 00:01 kaby76

@kaby76 thanks.

ningpp avatar Jan 30 '23 00:01 ningpp