antlr4 icon indicating copy to clipboard operation
antlr4 copied to clipboard

Antlr4 4.9.1 does not detect infinite recursion in a rule

Open kaby76 opened this issue 3 years ago • 8 comments

The Antlr tool allows malformed rules with an infinite recursion. Here are two examples:

There are several other examples in grammars-v4 that have this recursion (here and here, thanks to @matevosmehrabyan for finding). These examples illustrate a need to add a check to the Antlr tool for "non-terminating infinite recursion".

These examples are for a single alt in a single rule, but I could imagine a similar problem with multiple alts and multiple rules. But perhaps Antlr flags a similar malformed grammar because the mutual recursion check would overshadow this.

--Ken

kaby76 avatar Mar 04 '21 11:03 kaby76

As I understand it's detection of such rules is quite compicated. Also, they are warnings but not errors because it's just not possible to parser anything using such rules, for instance, this one:

a: B a;

KvanTTT avatar Nov 13 '21 10:11 KvanTTT

Actually the algorithm to detect infinite recursion is easy to implement, which I show how to do below. I do not define what is an "infinite recusing rule", and I do not give the proof as to the validity of the algorithm, but that should be done.

  1. Construct the ATNs for all rules in the grammar.
  2. Perform a DFS walk of the states in the graph containing the collection of ATNs as follows: a) Put on a stack all start states of all ATNs. b) When visiting a state, examine all edges out from the state. If the edge is over a terminal, visit the transition "to" state. If the transition is over a non-terminal, visit the start state for the ATN of the non-terminal named. c) Do not "visit" a state that has already been visited. Do not visit back edges of the graph. d) After finishing the DFS visits, examine whether the "end state" for an ATN/rule has been visited. If not, then rule is an infinite recursing rule.

The rule a: B a; is actually one of the easiest to verify as infinite: the end state of the ATN is never visited.

kaby76 avatar Nov 13 '21 11:11 kaby76

Ok, this algorithm looks quite simple and fast. I think it should be implemented in ANTLR. Also, left recursive rules should be processed in another way, left recursion part should be ignored (or should be processed after transformation to ordinary recursive rules).

KvanTTT avatar Jan 12 '22 14:01 KvanTTT

Some cases to consider:

infinite1 : (infinite1 'c')+ ; // infinite
infinite2 : ('c' infinite2)+ ; // infinite
infinite3 : (infinite3 'c')* ; // okay
infinite4 : ('c' infinite4)* ; // okay

Antlr will flag one as mutual left recursion (with itself), but it should be a valid rule because I think one can refactor it to make it work just as plain ol' left recursion is done.

kaby76 avatar Jan 12 '22 15:01 kaby76

There is a problem with parentheses support in left recursive rules: https://github.com/antlr/antlr4/issues/564

KvanTTT avatar Jan 12 '22 15:01 KvanTTT

Apparently, the problem with #564 is solved by migrating to Antlr4 grammar. Another thing to do I suppose. Moving past the bootstrapping grammar should be part of the whole cycle when a "next version" is written (if ever). If the tool grammar is updated, I might suggest rewriting the grammar so that RULE_REF and TOKEN_REF are pitched in favor of a semantic-based "ID". The hard-fast tokens force renaming refactorings to be performed in a particular order. Fortunately, I can rename more than one symbol in a refactoring, but there are other similar issues in the grammar where the syntax for lexer and parser rules are separate.

kaby76 avatar Jan 12 '22 15:01 kaby76

Apparently, the problem with #564 is solved by migrating to Antlr4 grammar. Another thing to do I suppose.

I'm also working on mentioned issue, actually I've already refactored code that processes left recursion. Unfortunately ANTLR 4 uses ANTLR 3 internally and it's unlikely ANTLR 3 will be completely dropped soon.

KvanTTT avatar Jan 17 '22 15:01 KvanTTT

This errors are also actual for lexer rules as well. By the way, now we have The following sets of rules are mutually left-recursive for left recursive rules but not for custom recursive rules:

A: 'A' A; // Ok for now
B: B 'B'; // Error: The following sets of rules are mutually left-recursive

KvanTTT avatar Apr 09 '22 21:04 KvanTTT