grammars-v4
grammars-v4 copied to clipboard
[cpp] Fix for #4149 -- fix translation/refactoring of several rules from the ISO spec
This PR corrects improper translation/refactoring of several rules from the ISO Spec. Below are the rules from the spec and the transformations used to obtain final form. The changes fix #4149.
- Rules re-scraped and transformed from Spec into correct syntax.
- Rules have been reformatted according to the Coding Standard for this repo using antlr-format-cli.
- New tests entered to test declarations in #4149.
- There is new work to be done. https://github.com/antlr/grammars-v4/issues/4151 https://github.com/antlr/grammars-v4/issues/4152
ptr-abstract-declarator
Spec:
ptr-abstract-declarator:
noptr-abstract-declarator
ptr-operator ptr-abstract-declaratoropt
- Step 1: Convert to Antlr4 syntax.
pointerAbstractDeclarator
: noPointerAbstractDeclarator
| pointerOperator pointerAbstractDeclarator?
;
- Step 2: Remove
?-operator (two sub-steps:a x? => a(x| ) => a x | a):
pointerAbstractDeclarator
: noPointerAbstractDeclarator
| pointerOperator pointerAbstractDeclarator
| pointerOperator
;
- Step 3: Reorder alts.
pointerAbstractDeclarator
: pointerOperator pointerAbstractDeclarator
| noPointerAbstractDeclarator
| pointerOperator
;
- Step 4: Group.
pointerAbstractDeclarator
: pointerOperator pointerAbstractDeclarator
| ( noPointerAbstractDeclarator | pointerOperator )
;
- Step 5: Convert to Kleene operator. Note: From Trash code:
// Right recursion:
// Convert A -> b1 A | b2 A | ... | a1 | a2 | ... ;
// into A -> (b1 | b2 | ...)* (a1 | a2 | ... )
pointerAbstractDeclarator
: pointerOperator* ( noPointerAbstractDeclarator | pointerOperator )
;
noptr-abstract-declarator
Spec:
noptr-abstract-declarator:
noptr-abstract-declaratoropt parameters-and-qualifiers
noptr-abstract-declaratoropt [ constant-expressionopt ] attribute-specifier-seqopt
( ptr-abstract-declarator )
- Step 1: Translated to Antlr4 syntax, which is mutually-left recursive.
noPointerAbstractDeclarator
: noPointerAbstractDeclarator? parametersAndQualifiers
| noPointerAbstractDeclarator? LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
| LeftParen pointerAbstractDeclarator RightParen
;
- Step 2: Remove
?-operator.
noPointerAbstractDeclarator
: noPointerAbstractDeclarator parametersAndQualifiers
| parametersAndQualifiers
| noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
| LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
| LeftParen pointerAbstractDeclarator RightParen
;
- Step 3: Reorder alts.
noPointerAbstractDeclarator
: noPointerAbstractDeclarator parametersAndQualifiers
| noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
| parametersAndQualifiers
| LeftParen pointerAbstractDeclarator RightParen
;
- Step 4: Group alts.
noPointerAbstractDeclarator
: noPointerAbstractDeclarator (parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )
| (parametersAndQualifiers | LeftParen pointerAbstractDeclarator RightParen)
;
- Step 5: Transform to
*-operator closure. Note: From Trash code
// Left recursion:
// A -> A b1 | A b2 | ... | a1 | a2 | ... ;
// => A -> (a1 | a2 | ... ) (b1 | b2 | ...)*;
noPointerAbstractDeclarator
: (parametersAndQualifiers | LeftParen pointerAbstractDeclarator RightParen) (parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )*
;
abstract-pack-declarator (checking correctness)
Spec:
abstract-pack-declarator:
noptr-abstract-pack-declarator
ptr-operator abstract-pack-declarator
- Step 1: Convert to Antlr4 syntax.
abstractPackDeclarator
: noPointerAbstractPackDeclarator
| pointerOperator abstractPackDeclarator
;
- Step 2: Convert to kleene operator.
abstractPackDeclarator
: pointerOperator* noPointerAbstractPackDeclarator
;
This rule is correct and does not need any changes.
noptr-abstract-pack-declarator
Spec:
noptr-abstract-pack-declarator:
noptr-abstract-pack-declarator parameters-and-qualifiers
noptr-abstract-pack-declarator [ constant-expressionopt ] attribute-specifier-seqopt
...
- Step 1: Convert to Antlr4 syntax.
noPointerAbstractPackDeclarator
: noPointerAbstractPackDeclarator parametersAndQualifiers
| noPointerAbstractPackDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
| Ellipsis
;
- Step 2: Group.
noPointerAbstractPackDeclarator
: noPointerAbstractPackDeclarator ( parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )
| Ellipsis
;
- Step 3: Kleene-ize rule.
noPointerAbstractPackDeclarator
: Ellipsis ( parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )*
;
@kaby76 I tested this branch with high success rate, the only tests which fail so far for me:
std::unique_ptr<uint8_t[]> test;
std::array<int[1], 1> test;
Again I am parsing expressions statement not a whole header file.
@kaby76 I tested this branch with high success rate, the only tests which fail so far for me:
std::unique_ptr<uint8_t[]> test; std::array<int[1], 1> test;Again I am parsing expressions statement not a whole header file.
Thanks, that's helpful. Keep it coming! Still trying to figure out testing for abstract-pack-declarator and noptr-abstract-pack-declarator.
Looks like some problems with basic types. std::array<int[1], 1> test; not accepted, but std::array<xxx[1], 1> test; is fine.
Thanks, that's helpful. Keep it coming! Still trying to figure out testing for abstract-pack-declarator and noptr-abstract-pack-declarator.
With pleasure! Actually this is how I am catching these defects:
- I wrote a
clangtool which is a part of a bigger infrastructure I am working on - I visit every Decl/Type available and try to parse its fully qualified name (as a statement expression) with the
ANTLRgrammar - I check for parsing failure and assert parsed Decl/Type to be equal to the original ones
Ironically, had to use ANTLR there because clang, while being a top notch compiler, it doesn't parse invalid types, it generates errors instead, which is kind of expected from a compiler.
I tried adding the basic types ("long", "int", etc) to unqualifiedId. This "takes care of" parsing the several new problems you mentioned, e.g., std::array<int[1], 1> test;. But, this also causes some declarations to be parsed out with declarators, when it should not. E.g., int foo( unsigned long );. But, I now realize this hack is wrong because I was trying to get this to parse like std::array<xxx[1], 1> test; and it is not. So the problem is elsewhere.
I'm going to get submit this PR first, and work on getting this new bug fixed as a separate issue. https://github.com/antlr/grammars-v4/issues/4151 I will also need to pass on getting the testing for noptr-abstract-pack-declarator even though I have fixed the rule. That will be made into a separate Github issue. https://github.com/antlr/grammars-v4/issues/4152
Nice! I believe this should be our best bet. Just a question, while C++ being notoriously one of the hardest language to parse, why it should be this hard if we are following the standard specs and not considering the preprocessing phase?
I do have a preprocessor that is generated from the preprocessor. g4 grammar. I haven't added it yet. I think I was trying to figure out how to integrate it with the trgen testing framework.
The cpp grammar may need semantic predicates to get it to work right. But one step at a time.
@teverett This PR is ready. I entered a couple of Issues for things I uncovered while fixing this issue. We'll take care of those issues later. https://github.com/antlr/grammars-v4/issues/4152 https://github.com/antlr/grammars-v4/issues/4151
@kaby76 Just a notice: while I don't know the grammar you are using (you were referring to openstd before), I was able to fix these issues above using this grammar. Actually minor modifications were needed. That grammar is C++20+ though.
@kaby76 Just a notice: while I don't know the grammar you are using (you were referring to openstd before), I was able to fix these issues above using this grammar. Actually minor modifications were needed. That grammar is C++20+ though.
Please share your grammar so I can see that it fixes what you say, see what refactorings you made, and see whether the performance is acceptable. And, yes, the plan is to use the grammar from the ISO spec (or this, or the .tex files for drafts), and generate an Antlr grammar for it.
The reason I applied the refactorings over the rules in dcl.name is to (1) fix the mutually-left recursion, (2) reduce the size of the Antlr parse tree so it can perform better.
In the meanwhile, this PR is ready to merge.
@kaby76 thanks!