grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

Antlr 4.10 preview: errors in antlr/antlr4 and v/

Open kaby76 opened this issue 3 years ago • 5 comments

I've been updating the Antlr4 tool for the Go target and decided to test out the new tool on grammars-v4/ as this is a more extensive test that the unit tests for the Antlr4 tool.

  • error(153): ANTLRv3Lexer.g4:113:0: rule DOUBLE_ANGLE_STRING_LITERAL contains a closure with at least one alternative that can match an empty string
  • error(186): V.g4:670:0: rule elementList contains a closure with at least one alternative that can match EOF

kaby76 avatar Jan 16 '22 19:01 kaby76

Thanks for testing 4.9.4 (actually 4.10) on the entire grammars repository. I just wanted to start a discussion about that. Also, caseInsensitive option should be tested (I've already started it).

As I see they the errors are completely valid. But in V.g4 it probably should be several contains a closure with at least one alternative that can match EOF errors (see rules with eos under closure).

KvanTTT avatar Jan 16 '22 20:01 KvanTTT

@KvanTTT I added a check to trgen to fail on caseInsensitive = true but I just forgot what you were working on. Sorry. I'll fix the check so that it doesn't crash on the value. I'll fix antlr/antlr3 and v soon when I can get past https://github.com/antlr/antlr4/pull/3486

kaby76 avatar Jan 16 '22 20:01 kaby76

I added a check to trgen to fail on caseInsensitive = true but I just forgot what you were working on. Sorry. I'll fix the check so that it doesn't crash on the value

Yes, I've also transformed some grammar with fragments TOKEN: T O K E N -> TOKEN: 'TOKEN';, but have not completed yet.

KvanTTT avatar Jan 17 '22 10:01 KvanTTT

The v/ grammar here is not very good. It does not parse most files in the "vlib" runtime library. So, we really should consider replacing it. (I have a fix for the antlr/antlr3 grammar.)

We could hope that github/vlang/ adds a EBNF grammar and use that as a basis here. There are various requests for an EBNF for V. But, likely it'll never be done (1).

Of course, there is a parser for V. It is hand-written code (why would it be any different for V as with every other language in our miserable profession) (2). It's noted that "[u]nlike many other languages, V is not going to be always changing" (3), but there have been 13 changes in parser.v over the last month alone (4). Scraping that would be difficult.

The VSCode extension for V (5) is a TextMate implementation, so at best it gives only the lexical structure, in the great EBNF syntax JSON at that.

I recommend redoing the grammar by scraping it from the Tree Sitter grammar for V (6 or 7) within the LSP server code. It is being maintained. v-raw-scrapped.txt However, I don't know if one would end up right back here: a grammar that isn't very good.

kaby76 avatar Jan 21 '22 15:01 kaby76

I've reworked the convertor from tree-sitter to pseudo-Antlr4 grammars. As it turns out, one can't really use the grammar.js as input for the conversion: Tree-sitter performs a partial evaluation of the .js code. The table used in the V grammar for expressions must be converted by tree-sitter itself. The converter must use the output from tree-sitter, the grammar.json file. The raw grammar for V is here: grammar.txt

kaby76 avatar Jan 26 '22 00:01 kaby76