rascal icon indicating copy to clipboard operation
rascal copied to clipboard

Ambiguity in Rascal grammar in an expression with both `^` and `$` does not arise from a semantic ambiguity

Open eh-adblockplus-org opened this issue 7 years ago • 21 comments

The following grammar fails to compile. WholeLine fails to compile because it triggers an ambiguity in the Rascal grammar, which then has no case statement to handle the ambiguity node in sym2symbol, whose default throws an exception that terminates compilation.

module ssce
lexical WholeLine = ^ Contents $ ;
lexical WholeLineWorkaround = (^ Contents ()) $ ;
lexical Contents = ![\r\n]+ ;

The grammar rules that provided the opportunity for ambiguity are these, from Rascal.rsc:

Syntax Sym
[...]
	| endOfLine: Sym symbol "$" 
	| startOfLine: "^" Sym symbol

There is no associativity declared for these productions. Because they're non-associative, you can't put both of them on the same symbol without causing an ambiguity. The semantics of WholeLine, however, are unambiguous. The Rascal grammar here is simply counter-intuitive.

I was working inside Eclipse (0.9 stable, although I first had the problem with the unstable branch), there was a compilation error that was silent. The problem was showing up only in the Eclipse window Error Log, which is not part of the default Rascal perspective. I only saw the compilation error after I made a fresh Eclipse installation and loaded the project before the Rascal perspective took over.

Furthermore, this error was not showing up as a syntax error in the editing window. Apparently there's some mismatch in behavior between the IDE grammar and the compiler grammar for this case.

The ordinary workaround non-associative operators is to use parentheses to make the association explicit. This doesn't work in the current grammar, though, because of the way that parentheses-terms work.

Syntax Sym
[...]
	| sequence: "(" Sym first Sym+ sequence ")"

Parentheses only apply to lists of two or more symbols and do not apply to single symbol. Thus the workaround has extra parentheses as an empty symbol just to get a second symbol for the disambiguating parentheses. It's not clear to me why there's no single symbol production for parentheses, but there's not, and it turns the obvious workaround into something more complicated.

This one issue report contains more than one problem, admittedly. Here's a list of the things to do:

  • Semantically speaking, ^ and $ are just special versions of precede and follow rules, which do have associativity defined. Perhaps the easiest fix is to move the productions for ^ and $ in with those of precede and follow. If that's not right for some reason, they could just get their own associativity group.
  • Ensure that there's a production for "single-symbol sequences". These are simple disambiguating parentheses. Either add another production for it or allow existing sequences to contain only one element.
  • Ensure that ambiguity errors that arise from the Rascal grammar itself appears as syntax errors in the IDE. Given the code I've seen, I'd guess that the lack of reporting might be generic to all ambiguity errors. For example, there are the same symptoms with this production: lexical AnotherLine = ^ Contents ? ;
  • Add the Eclipse Error Log to the Rascal perspective. It will expose errors to the user that would otherwise be masked.

eh-adblockplus-org avatar Jul 18 '18 02:07 eh-adblockplus-org