bnfc icon indicating copy to clipboard operation
bnfc copied to clipboard

Lexer issues, in particular: Java backends do not accept/require whitespace between consecutive tokens

Open andreasabel opened this issue 5 years ago • 0 comments

The following grammar should parse ⟦ ab c.

Whatever. Main ::= Uni Foo Bar;

token Uni '⟦' ;
token Foo letter letter;
token Bar (char - 'a');

This is the situation in the different backends:

  • [x] Haskell: yes
  • [ ] Ocaml: ocamllex refuses generated lexer definition with error
    File "Lextest.mll", line 42, character 11: illegal escape sequence \1.
    
  • [ ] C: parsing fails with error: 1,1: syntax error at ?
  • [ ] CPP: parsing fails with Parse error on line 1
  • [ ] Java: parsing fails with
    Syntax Error, trying to recover and continue parse... for input symbol "" spanning from unknown:-1/-1(-1) to unknown:-1/-1(-1)
    At line -1, near "ab c" :
       Unrecoverable Syntax Error
    
  • [ ] Java/ANTLR: parsing fails with
    line 1:1 extraneous input ' ' expecting Foo
    At line 1, column 1 :
       extraneous input ' ' expecting Foo
    

The parsers generated by the Java backends accept instead the input without the spaces: ⟦abc.

andreasabel avatar Nov 13 '20 08:11 andreasabel