bnfc
bnfc copied to clipboard
Lexer issues, in particular: Java backends do not accept/require whitespace between consecutive tokens
The following grammar should parse ⟦ ab c.
Whatever. Main ::= Uni Foo Bar;
token Uni '⟦' ;
token Foo letter letter;
token Bar (char - 'a');
This is the situation in the different backends:
- [x] Haskell: yes
- [ ] Ocaml:
ocamllexrefuses generated lexer definition with errorFile "Lextest.mll", line 42, character 11: illegal escape sequence \1. - [ ] C: parsing fails with
error: 1,1: syntax error at ? - [ ] CPP: parsing fails with
Parse error on line 1 - [ ] Java: parsing fails with
Syntax Error, trying to recover and continue parse... for input symbol "" spanning from unknown:-1/-1(-1) to unknown:-1/-1(-1) At line -1, near "ab c" : Unrecoverable Syntax Error - [ ] Java/ANTLR: parsing fails with
line 1:1 extraneous input ' ' expecting Foo At line 1, column 1 : extraneous input ' ' expecting Foo
The parsers generated by the Java backends accept instead the input without the spaces: ⟦abc.