grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

[cpp] Fix for #4149 -- fix translation/refactoring of several rules from the ISO spec

Open kaby76 opened this issue 1 year ago • 7 comments
trafficstars

This PR corrects improper translation/refactoring of several rules from the ISO Spec. Below are the rules from the spec and the transformations used to obtain final form. The changes fix #4149.

  • Rules re-scraped and transformed from Spec into correct syntax.
  • Rules have been reformatted according to the Coding Standard for this repo using antlr-format-cli.
  • New tests entered to test declarations in #4149.
  • There is new work to be done. https://github.com/antlr/grammars-v4/issues/4151 https://github.com/antlr/grammars-v4/issues/4152

ptr-abstract-declarator

Spec:

ptr-abstract-declarator:
    noptr-abstract-declarator
    ptr-operator ptr-abstract-declaratoropt
  • Step 1: Convert to Antlr4 syntax.
pointerAbstractDeclarator
    : noPointerAbstractDeclarator
    | pointerOperator pointerAbstractDeclarator?
    ;
  • Step 2: Remove ?-operator (two sub-steps: a x? => a(x| ) => a x | a):
pointerAbstractDeclarator
    : noPointerAbstractDeclarator
    | pointerOperator pointerAbstractDeclarator
    | pointerOperator
    ;
  • Step 3: Reorder alts.
pointerAbstractDeclarator
    : pointerOperator pointerAbstractDeclarator
    | noPointerAbstractDeclarator
    | pointerOperator
    ;
  • Step 4: Group.
pointerAbstractDeclarator
    : pointerOperator pointerAbstractDeclarator
    | ( noPointerAbstractDeclarator | pointerOperator )
    ;
// Right recursion:
// Convert A -> b1 A | b2 A | ... | a1 | a2 | ... ;
// into A ->   (b1 | b2 | ...)* (a1 | a2 | ... )
pointerAbstractDeclarator
    : pointerOperator* ( noPointerAbstractDeclarator | pointerOperator )
    ;

noptr-abstract-declarator

Spec:

noptr-abstract-declarator:
    noptr-abstract-declaratoropt parameters-and-qualifiers
    noptr-abstract-declaratoropt [ constant-expressionopt ] attribute-specifier-seqopt
    ( ptr-abstract-declarator )
  • Step 1: Translated to Antlr4 syntax, which is mutually-left recursive.
noPointerAbstractDeclarator
    : noPointerAbstractDeclarator? parametersAndQualifiers
    | noPointerAbstractDeclarator? LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftParen pointerAbstractDeclarator RightParen
    ;
  • Step 2: Remove ?-operator.
noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftParen pointerAbstractDeclarator RightParen
    ;
  • Step 3: Reorder alts.
noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | parametersAndQualifiers
    | LeftParen pointerAbstractDeclarator RightParen
    ;
  • Step 4: Group alts.
noPointerAbstractDeclarator
    : noPointerAbstractDeclarator (parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )
    | (parametersAndQualifiers | LeftParen pointerAbstractDeclarator RightParen)
    ;
// Left recursion:
// A -> A b1 | A b2 | ... | a1 | a2 | ... ;
// => A ->  (a1 | a2 | ... ) (b1 | b2 | ...)*;
noPointerAbstractDeclarator
    : (parametersAndQualifiers | LeftParen pointerAbstractDeclarator RightParen) (parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )*
    ;

abstract-pack-declarator (checking correctness)

Spec:

abstract-pack-declarator:
    noptr-abstract-pack-declarator
    ptr-operator abstract-pack-declarator
  • Step 1: Convert to Antlr4 syntax.
abstractPackDeclarator
    : noPointerAbstractPackDeclarator
    | pointerOperator abstractPackDeclarator
    ;
  • Step 2: Convert to kleene operator.
abstractPackDeclarator
    : pointerOperator* noPointerAbstractPackDeclarator
    ;

This rule is correct and does not need any changes.

noptr-abstract-pack-declarator

Spec:

noptr-abstract-pack-declarator:
    noptr-abstract-pack-declarator parameters-and-qualifiers
    noptr-abstract-pack-declarator [ constant-expressionopt ] attribute-specifier-seqopt
    ...
  • Step 1: Convert to Antlr4 syntax.
noPointerAbstractPackDeclarator
    : noPointerAbstractPackDeclarator parametersAndQualifiers
    | noPointerAbstractPackDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | Ellipsis
    ;
  • Step 2: Group.
noPointerAbstractPackDeclarator
    : noPointerAbstractPackDeclarator ( parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )
    | Ellipsis
    ;
  • Step 3: Kleene-ize rule.
noPointerAbstractPackDeclarator
    : Ellipsis ( parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )*
    ;

kaby76 avatar Jul 02 '24 03:07 kaby76

@kaby76 I tested this branch with high success rate, the only tests which fail so far for me:

std::unique_ptr<uint8_t[]> test;
std::array<int[1], 1> test;

Again I am parsing expressions statement not a whole header file.

deadlocklogic avatar Jul 02 '24 14:07 deadlocklogic

@kaby76 I tested this branch with high success rate, the only tests which fail so far for me:

std::unique_ptr<uint8_t[]> test;
std::array<int[1], 1> test;

Again I am parsing expressions statement not a whole header file.

Thanks, that's helpful. Keep it coming! Still trying to figure out testing for abstract-pack-declarator and noptr-abstract-pack-declarator.

kaby76 avatar Jul 02 '24 15:07 kaby76

Looks like some problems with basic types. std::array<int[1], 1> test; not accepted, but std::array<xxx[1], 1> test; is fine.

kaby76 avatar Jul 02 '24 15:07 kaby76

Thanks, that's helpful. Keep it coming! Still trying to figure out testing for abstract-pack-declarator and noptr-abstract-pack-declarator.

With pleasure! Actually this is how I am catching these defects:

  • I wrote a clang tool which is a part of a bigger infrastructure I am working on
  • I visit every Decl/Type available and try to parse its fully qualified name (as a statement expression) with the ANTLR grammar
  • I check for parsing failure and assert parsed Decl/Type to be equal to the original ones

Ironically, had to use ANTLR there because clang, while being a top notch compiler, it doesn't parse invalid types, it generates errors instead, which is kind of expected from a compiler.

deadlocklogic avatar Jul 02 '24 15:07 deadlocklogic

I tried adding the basic types ("long", "int", etc) to unqualifiedId. This "takes care of" parsing the several new problems you mentioned, e.g., std::array<int[1], 1> test;. But, this also causes some declarations to be parsed out with declarators, when it should not. E.g., int foo( unsigned long );. But, I now realize this hack is wrong because I was trying to get this to parse like std::array<xxx[1], 1> test; and it is not. So the problem is elsewhere.

I'm going to get submit this PR first, and work on getting this new bug fixed as a separate issue. https://github.com/antlr/grammars-v4/issues/4151 I will also need to pass on getting the testing for noptr-abstract-pack-declarator even though I have fixed the rule. That will be made into a separate Github issue. https://github.com/antlr/grammars-v4/issues/4152

kaby76 avatar Jul 03 '24 10:07 kaby76

Nice! I believe this should be our best bet. Just a question, while C++ being notoriously one of the hardest language to parse, why it should be this hard if we are following the standard specs and not considering the preprocessing phase?

deadlocklogic avatar Jul 03 '24 10:07 deadlocklogic

I do have a preprocessor that is generated from the preprocessor. g4 grammar. I haven't added it yet. I think I was trying to figure out how to integrate it with the trgen testing framework.

The cpp grammar may need semantic predicates to get it to work right. But one step at a time.

kaby76 avatar Jul 03 '24 10:07 kaby76

@teverett This PR is ready. I entered a couple of Issues for things I uncovered while fixing this issue. We'll take care of those issues later. https://github.com/antlr/grammars-v4/issues/4152 https://github.com/antlr/grammars-v4/issues/4151

kaby76 avatar Jul 07 '24 15:07 kaby76

@kaby76 Just a notice: while I don't know the grammar you are using (you were referring to openstd before), I was able to fix these issues above using this grammar. Actually minor modifications were needed. That grammar is C++20+ though.

deadlocklogic avatar Jul 07 '24 16:07 deadlocklogic

@kaby76 Just a notice: while I don't know the grammar you are using (you were referring to openstd before), I was able to fix these issues above using this grammar. Actually minor modifications were needed. That grammar is C++20+ though.

Please share your grammar so I can see that it fixes what you say, see what refactorings you made, and see whether the performance is acceptable. And, yes, the plan is to use the grammar from the ISO spec (or this, or the .tex files for drafts), and generate an Antlr grammar for it.

The reason I applied the refactorings over the rules in dcl.name is to (1) fix the mutually-left recursion, (2) reduce the size of the Antlr parse tree so it can perform better.

In the meanwhile, this PR is ready to merge.

kaby76 avatar Jul 08 '24 11:07 kaby76

@kaby76 thanks!

teverett avatar Jul 15 '24 15:07 teverett