antlr4 incorrect 'no viable alternative at input' error is throw

The bug is described in https://stackoverflow.com/questions/73490454/why-is-no-viable-alternative-at-input-error-throw

With next grammar:

grammar CommaSeparatorField;

document : field ( COMMA field)* EOF ;

field	:  value[false]+ ;   

value[ boolean commaAllow]
	: TEXT
	| NUMBER 
	| {$commaAllow}? COMMA
	;
	
TEXT : [a-zA-Z]+ ;
NUMBER : [0-9]+ ;
COMMA : ',' ;

But with the next equivalent (almost the same, just only changed $commaAllow condition to funtion that returns false ) gramar:

grammar CommaSeparatorField;

@parser::members
	{
	boolean falseFuntion() {
		return false;
		}	
	}

document : field ( COMMA field)* EOF ;

field	:  value[false]+ ;   

value[ boolean commaAllow]
	: TEXT
	| NUMBER 
	| {falseFuntion()}? COMMA
	;
	
TEXT : [a-zA-Z]+ ;
NUMBER : [0-9]+ ;
COMMA : ',' ;

or next one (removing completly parameter commaAllow) :

grammar CommaSeparatorField;

@parser::members
	{
	boolean falseFuntion() {
		return false;
		}	
	}

document : field ( COMMA field)* EOF ;

field	:  value+ ;   

value
	: TEXT
	| NUMBER 
	| {falseFuntion()}? COMMA
	;
	
TEXT : [a-zA-Z]+ ;
NUMBER : [0-9]+ ;
COMMA : ',' ;

if input text is:

ab12cd,ef34gh

the first one (on my opinion this is the bug) I have:

line 1:6 no viable alternative at input ','

And with the second and third I have no error.

The log of parser in firsrt one is next:

enter   document, LT(1)=ab
enter   field, LT(1)=ab
enter   value, LT(1)=ab
consume [@0,0:1='ab',<1>,1:0] rule value
exit    value, LT(1)=12
enter   value, LT(1)=12
consume [@1,2:3='12',<2>,1:2] rule value
exit    value, LT(1)=cd
enter   value, LT(1)=cd
consume [@2,4:5='cd',<1>,1:4] rule value
exit    value, LT(1)=,

enter   value, LT(1)=,    <- bad behaviour. why it tries to enter into value if commaAllow==false and LT(1)=, ?
                             Shouldn't it to:
                                 - exit field
                                 - consume ',' (COMMA) in rule 'document' 
                                 - enter again in rule 'field' 
                             ?

**** no viable alternative at input ',' is throw

exit    value, LT(1)=,
enter   value, LT(1)=,
exit    value, LT(1)=ef
enter   value, LT(1)=ef
consume [@4,7:8='ef',<1>,1:7] rule value
exit    value, LT(1)=34
enter   value, LT(1)=34
consume [@5,9:10='34',<2>,1:9] rule value
exit    value, LT(1)=gh
enter   value, LT(1)=gh
consume [@6,11:12='gh',<1>,1:11] rule value
exit    value, LT(1)=<EOF>
exit    field, LT(1)=<EOF>
consume [@7,13:12='<EOF>',<-1>,1:13] rule document
exit    document, LT(1)=<EOF>

The log of parser in second and third is next:

enter   document, LT(1)=ab
enter   field, LT(1)=ab
enter   value, LT(1)=ab
consume [@0,0:1='ab',<1>,1:0] rule value
exit    value, LT(1)=12
enter   value, LT(1)=12
consume [@1,2:3='12',<2>,1:2] rule value
exit    value, LT(1)=cd
enter   value, LT(1)=cd
consume [@2,4:5='cd',<1>,1:4] rule value
exit    value, LT(1)=,
exit    field, LT(1)=,                                            <-- Right behaviour
consume [@3,6:6=',',<3>,1:6] rule document
enter   field, LT(1)=ef
enter   value, LT(1)=ef
consume [@4,7:8='ef',<1>,1:7] rule value
exit    value, LT(1)=34
enter   value, LT(1)=34
consume [@5,9:10='34',<2>,1:9] rule value
exit    value, LT(1)=gh
enter   value, LT(1)=gh
consume [@6,11:12='gh',<1>,1:11] rule value
exit    value, LT(1)=<EOF>
exit    field, LT(1)=<EOF>
consume [@7,13:12='<EOF>',<-1>,1:13] rule document
exit    document, LT(1)=<EOF>

Aug 26 '22 09:08 alejandro-anadon

It seems that somewere in grammar generation is looking if parameter is used in option rule. Next grammar also fails:


@parser::members
	{
	boolean falseFuntion(boolean foo) {
		return false;
		}	
	}

document : field ( COMMA field)* EOF ;

field	:  value[false]+ ;   

value [ boolean commaAllow]
	: TEXT
	| NUMBER 
	| {falseFuntion($commaAllow)}? COMMA
	;
	
TEXT : [a-zA-Z]+ ;
NUMBER : [0-9]+ ;
COMMA : ',' ;

this confirms (on my opinion) that it is a bug.

Aug 26 '22 09:08 alejandro-anadon

Hi,

Has anyone been able to reproduce this issue? I just wanted to confirm if it really is a bug or a misunderstanding of antlr4. In case it is a misunderstanding, what is my mistake? In case of bug, I think @parrt is (or will soon be) working on bugs related to ATN stuff, and maybe this one is too. If so, now might be a good time to see why the bug.

(I didn't say it from the beginning but the target language is java and tested with ANTLR 4.10.1)

Aug 30 '22 14:08 alejandro-anadon

hi. The use of semantic predicates is sometimes counterintuitive in the decision making process. It is possible there is a bug but I would have to dig down to see what the issue is that I can't do it right now. I can definitely see semantic predicates that are parameters causing trouble because you are now affectively requiring an action (perimeter passing is an assignment statement) during the evaluation of lookahead. No actions are possible during look ahead. So if I understand your question, value[false]+ is never going to work as you expect because you are requiring an action to execute during look ahead, which it's not allowed. It gives undefined behavior.

Aug 30 '22 16:08 parrt

Thanks for the reply; and even more knowing that you are focused on releasing the 4.11 release.

I am not really trying to perform any action to execute during look ahead. What I am trying to do is to deactivate rule option according to parameter passed to rule. Maybe, 'passing parameter' is considered as an action, and that's my misunderstanding.

Perhaps the following grammar clarifies what I am trying to say:

grammar CommaSeparatorField;

document : line+ EOF ;

line locals[int numMatch=0]
	 : SINGLEVALUE COLON value[true]+ {$numMatch++; System.out.print("* ");}
       LF {System.out.println("Single. Num match = "+$numMatch);}
                                			
	 | MULTIPLEVALUE COLON value[false]+  {$numMatch++;  System.out.print("* ");} 
	 	 ( COMMA value[false]+ {$numMatch++;  System.out.print(" * ");})* 
	   LF   {System.out.println("Multiple. Num match = "+$numMatch);} 	
	 ;

value [ boolean commaAllow]
	: TEXT
	| {<lets-see-below>}?  COMMA     //  <--This is the conflicting line	
	;

WS : ' ' -> skip;
LF : [\r\n]+ ;
SINGLEVALUE : 'SINLGE';
MULTIPLEVALUE : 'MULTIPLE';
COLON : ':';	
COMMA : ','  ;
TEXT : [a-zA-Z]+ ;

the idea is that if imput is:

SINLGE: aa
SINLGE: aa,bb
MULTIPLE: aa
MULTIPLE: aa  ,   bb

desired behavior should be (* means each 'value[true-or-false]+' match) :

* Single. Num match = 1
* Single. Num match = 1
* Multiple. Num match = 1
*  * Multiple. Num match = 2

Ok. If conflicting line is:

| {true}?  COMMA

it works fine for 'SINGLE' lines, but not for 'MULTIPLE' lines:

* Single. Num match = 1
* Single. Num match = 1
* Multiple. Num match = 1
* Multiple. Num match = 1

If conflicting line is:

| {false}?  COMMA

it works fine for 'MULTIPLE' lines, but not for ' SINGLE' lines:

* Single. Num match = 1
line 2:12 no viable alternative at input ','
* Multiple. Num match = 1
*  * Multiple. Num match = 2

Ok. Then let's try to use 'commaAllow' passed to 'value' rule, where on 'SINGLE' is true, and in 'MULTIPLE' is false. So, let's try the conflicting line with:

| {$commaAllow}?  COMMA

The unespected output is:

* Single. Num match = 1
* Single. Num match = 1
* Multiple. Num match = 1
line 5:12 no viable alternative at input ','
* Multiple. Num match = 1

I hope I have been clear, because I am not a native English speaker and I am constantly using online translators.

thanks

Aug 31 '22 12:08 alejandro-anadon

I don't have time to dig into this too deeply at the moment, but definitely value[true]+ is a loop that requires look ahead to decide whether to continue matching value rules. The semantic predicate will likely not be evaluated in order to make this choice. I can't interpret your expected output easily with the stars. sorry.

I think there's a larger issue here. Why are you using

SINGLEVALUE COLON value[true]+

Instead of just

SINGLEVALUE COLON value

??? If you only want one value then just match one.

If for some reason you are having difficulty using semantic predicates at part-time, simply match either construct and then check it semantically in the next phase after parsing. For example in the area where a compiler would match variable x but then in a later pass decide it was undefined.

Aug 31 '22 21:08 parrt

Thanks again for your time.

I imagine that you are with the new release, and I don't want to be the one who takes your time; so if you want, leave this for when you can.

in any case, l already found an alternative solution (very cumbersome); the objective of this issue is to determine if it is a bug or if it is a misunderstanding of the system; and if it is a misunderstanding ,If you could explain to me in what I'm wrong.

Anyway I will continue with this comment so you can dig into when you have time.

The grammar that I have put before is reduced to the maximum of another larger grammar with the aim of showing only what I consider a bug. But maybe I made a mistake because I made the general semantic of the grammar meaningless by focusing only on the bug.

Therefore I show you a slightly broader version with sense.

Note: I think that the solution that you propose to check semantically in the next phase I cannot do because the generated tree is not built with the sense that I want to give it.

grammar CommaSeparatorField;

document : line+ EOF ;

line : singlematchs COLON matches+=value[true] LF   // here we allow comma (true) because we don't want , if there is, to get comma parsed as separator
                                                    // if there is a comma, we want to match in 'content' rule
             {
             System.out.println($ctx.singlematchs().getText()+" has value the single value '"+$ctx.matches.get(0).getText()+"'");
             System.out.println("matches must be 1 size element (single match). and, of course it has "+$ctx.matches.size()+" value");
             System.out.println();
             }
     | multiplematchs COLON matches+=value[false] ( COMMA matches+=value[false])* LF  // here we DONT allow comma (false) 
                                                                                       // because, if there is a comma, we want
                                                                                       // to match COMMA in this 'line' rule (it is a value separator), 
                                                                                       // not in 'content' rule
            {
            System.out.println($ctx.multiplematchs().getText()+" has "+$ctx.matches.size()+" values");
            System.out.println("(We could iterate over matches but i think it is not necesary)");
            System.out.println();
            }    
     ;

value[boolean commaAllow] : content[$commaAllow]+ ;

content[ boolean commaAllow]
    : TEXT
    | NUMBER
    | {$commaAllow}?  COMMA      // <-- conflict line. "play" around with 'true', 'false' and '$commaAllow'.
    
                                 // A) With 'true' works fine with  singlematchs, not with multiplematchs (it can't split values using commas )
                                 
                                 // B) with 'false' works fine multiplematchs, not for singlematchs (if there is a comma in value ( for example 'Fifth Avenue 44, 1st floor') a "mismatched input ',' expecting LF" is launched)
                                 
                                 // C) with $commaAllow, when it comes with 'true' (from singlematchs) works fine. Thats ok.
                                 //
                                 //    but when it comes with 'false' (from multiplematchs), it is expected to work
                                 //    as if there were a 'false' and to work fine for multiplematchs.
                                 
                                 //    But it doesn't. It launch  a "no viable alternative at input ','" for each comma
    ;
    
// Elements that may have commas in its value. For example: 'Fifth Avenue 44, 1st floor'
// In this case, comma is not separator. It's part of value    
singlematchs : NAME
             | ADDRESS
             ;

// Elements that may NOT have commas in its values (because comma is value separator)    
multiplematchs : SPORTS
               | PETS
               ;    

NAME : 'NAME' ;
ADDRESS : 'ADDRESS' ;
SPORTS : 'SPORTS';
PETS : 'PETS';
WS : ' ' -> skip;  // Don`t take in consideration that values results as spaces removed
LF : [\r\n]+ ;
COLON : ':';    
COMMA : ','  ;
NUMBER : [0-9]+;
TEXT : [a-zA-Z]+ ;

if imput is:

NAME : FakeName FakeSurname
ADDRESS : Fifth Avenue 44, 1st floor
SPORTS : golf
PETS: cat, dog , hamster

Desired output:

NAME has value the single value 'FakeNameFakeSurname'
matches must be 1 size element (single match). and, of course it has 1 value

ADDRESS has value the single value 'FifthAvenue44,1stfloor'
matches must be 1 size element (single match). and, of course it has 1 value

SPORTS has 1 values
(We could iterate over matches but i think it not necesary)

PETS has 3 values
(We could iterate over matches but i think it not necesary)

But instead I have:

NAME has value the single value 'FakeNameFakeSurname'
matches must be 1 size element (single match). and, of course it has 1 value

ADDRESS has value the single value 'FifthAvenue44,1stfloor'
matches must be 1 size element (single match). and, of course it has 1 value

SPORTS has 1 values
(We could iterate over matches but i think it not necesary)

line 4:9 no viable alternative at input ','
line 4:15 no viable alternative at input ','
PETS has 1 values
(We could iterate over matches but i think it not necesary)

If we change the conflicting line with:

| {true}?  COMMA

I have (good for single , bad for multiple):

NAME has value the single value 'FakeNameFakeSurname'
matches must be 1 size element (single match). and, of course it has 1 value

ADDRESS has value the single value 'FifthAvenue44,1stfloor'
matches must be 1 size element (single match). and, of course it has 1 value

SPORTS has 1 values
(We could iterate over matches but i think it is not necesary)

PETS has 1 values
(We could iterate over matches but i think it is not necesary)

And If we change the conflicting line with:

| {false}?  COMMA

I have (good for multiple , bad for single):

NAME has value the single value 'FakeNameFakeSurname'
matches must be 1 size element (single match). and, of course it has 1 value

line 2:25 mismatched input ',' expecting LF
SPORTS has 1 values
(We could iterate over matches but i think it is not necesary)

PETS has 3 values
(We could iterate over matches but i think it is not necesary)

In colloquial words:

-If from the 'line' rule I parse an element that has to be a single match (name or address), I try to 'send' a true to 'content' rule so that it accepts commas (it will be treated as part of the text)

-If from the 'line' rule I parse an element that has to be a multiple match (sports or pets), I try to 'send' a false to 'content' rule so that it NOT accepts commas (if there is any, it will be parsed in the 'line' rule as a field separator)

I hope I was clearer this time.

Thanks in advance for the help.

Sep 01 '22 13:09 alejandro-anadon

Hi, I found a way to make it work; but it is a very dirty way.

It consists of creating a variable ('globalCommaAllow') in the parser that is assigned in the parent rule (in our case 'value' rule) before calling the child rule ('content' rule). Upon his return, his value is restored.

This "simulates" sending context variable(commaAllow) to the child rule and overcomes (dirty, but it works) the error that the child rule does not work well when the passed context variable is used as a condition (because it not uses '$commaAllow', it finally uses 'globalCommaAllow').

Next is the gramar: (NOTE: I intentionally leave the 'commaAllow' variable passing from 'value' to 'content' to make it easier to test the grammar by exchanging "{globalCommaAllow}?" (working) to "{$commaAllow}?" (not working). But that parameter sending from 'value' to 'content' can be removed if we only want to see how it works with the provided solution.

grammar CommaSeparatorField;

@parser::members
	{
	boolean globalCommaAllow;
	}

document : line+ EOF ;

line : singlematchs COLON matches+=value[true] LF   // here we allow comma (true) because we don't want , if there is, to get comma parsed as separator
                                                    // if there is a comma, we want to match in 'content' rule
             {
             System.out.println($ctx.singlematchs().getText()+" has value the single value '"+$ctx.matches.get(0).getText()+"'");
             System.out.println("matches must be 1 size element (single match). and, of course it has "+$ctx.matches.size()+" value");
             System.out.println();
             }
     | multiplematchs COLON matches+=value[false] ( COMMA matches+=value[false])* LF  // here we DONT allow comma (false) 
                                                                                       // because, if there is a comma, we want
                                                                                       // to match COMMA in this 'line' rule (it is a value separator), 
                                                                                       // not in 'content' rule
            {
            System.out.println($ctx.multiplematchs().getText()+" has "+$ctx.matches.size()+" values");
            System.out.println("(We could iterate over matches but i think it is not necesary)");
            System.out.println();
            }    
     ;

value[boolean commaAllow] 
locals [boolean previousGlobalCommaAllow] :
		{
		$previousGlobalCommaAllow=globalCommaAllow;
		globalCommaAllow=$commaAllow;
		}
 		content[$commaAllow]+    // Parameter in this case is unnecessary if the global variable 'globalCommaAllow'  in 'content' is going to be used.
 								 // We leave intentionally it to make it easier to try to exchange the two cases when tested
 		{
 		globalCommaAllow=$previousGlobalCommaAllow;
 		}
 	;

content[ boolean commaAllow]
    : TEXT
    | NUMBER
    | {globalCommaAllow}?  COMMA  // If we change globalCommaAllow to $commaAllow will not works.
    ;
    
// Elements that may have commas in its value. For example: 'Fifth Avenue 44, 1st floor'
// In this case, comma is not separator. It's part of value    
singlematchs : NAME
             | ADDRESS
             ;

// Elements that may NOT have commas in its values (because comma is value separator)    
multiplematchs : SPORTS
               | PETS
               ;    

NAME : 'NAME' ;
ADDRESS : 'ADDRESS' ;
SPORTS : 'SPORTS';
PETS : 'PETS';
WS : ' ' -> skip;  // Don`t take in consideration that values results as spaces removed
LF : [\r\n]+ ;
COLON : ':';    
COMMA : ','  ;
NUMBER : [0-9]+;
TEXT : [a-zA-Z]+ ;

In this way we do get the solution we are looking for.

Sep 05 '22 17:09 alejandro-anadon

I think I'm remembering a critical rule: semantic predicates can only depend on values available prior to initiation of a prediction. Even passing parameters along are not guaranteed.

Sep 05 '22 17:09 parrt

yep. a global will work because it doesn't violate the rules that actions cannot execute during a prediction event.

Sep 05 '22 17:09 parrt

Thank you very much for the reply.

Seeing the results, what you say is what is happening: passing parameters should not condition the decision; and if you try, you will have unexpected results.

Do you think it could be interesting to change this behavior so that, if parameters are passed, they can be used as semantic predicate conditions as I expose in this grammar and give the expected results? Maybe include those variables in the prediction initialization (I speak from ignorance) ?

I don't know if many users would use this feature; but in my case it would greatly simplify the grammar because it allows to easily activate and deactivate sub-rules depending on parent rules. But I imagine that due to the complexity of the algorithm, I may be asking for the impossible.

thanks again for your time

Sep 05 '22 17:09 alejandro-anadon

yeah, can't execute actions during the prediction as all hell would break loose.

Sep 05 '22 17:09 parrt

antlr4 antlr4 copied to clipboard

incorrect 'no viable alternative at input' error is throw

antlr4
antlr4 copied to clipboard