Lexer issues, in particular: Java backends do not accept/require whitespace between consecutive tokens

Open andreasabel opened this issue 5 years ago • 0 comments

The following grammar should parse ⟦ ab c.

Whatever. Main ::= Uni Foo Bar;

token Uni '⟦' ;
token Foo letter letter;
token Bar (char - 'a');

This is the situation in the different backends:

[x] Haskell: yes

[ ] Ocaml: ocamllex refuses generated lexer definition with error

File "Lextest.mll", line 42, character 11: illegal escape sequence \1.

[ ] C: parsing fails with error: 1,1: syntax error at ?
[ ] CPP: parsing fails with Parse error on line 1

[ ] Java: parsing fails with

Syntax Error, trying to recover and continue parse... for input symbol "" spanning from unknown:-1/-1(-1) to unknown:-1/-1(-1)
At line -1, near "ab c" :
   Unrecoverable Syntax Error

[ ] Java/ANTLR: parsing fails with

line 1:1 extraneous input ' ' expecting Foo
At line 1, column 1 :
   extraneous input ' ' expecting Foo

The parsers generated by the Java backends accept instead the input without the spaces: ⟦abc.

Nov 13 '20 08:11 andreasabel