grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

[rexx] Hexstrings and binarystrings aren't parsed correctly

Open RossPatterson opened this issue 3 years ago • 2 comments

Rexx has support for string literals in hexadecimal and binary. Hexstrings are, per ISO standard section 6.2.2.38, strings of traditional hex digits without regard to case, with optional blanks between pairs of digits (counting from the right), with an upper or lower case X appended after the closing quote (e.g., "1234"x, '12 34'X,). Similarly, binarystrings are, per section 6.2.2.41, strings of zeroes and ones with optional blanks between quadtuples of digits, with an upper or lower case B appended after the closing quote (e.g., "10100001"B, '1010 0001'b.

The rexx grammar collapses those into the STRING token type, and accepts invalid values (e.g., "1234"B, '12 3'x). The grammar should represent hex and binary strings separately from "normal" strings, and should not accept invalid forms.

I am working on a change to implement correct parsing for hex and binary strings. It will also do so in pure ANTLR, unlike the current implementation, which embeds Java code in a lexer action.

RossPatterson avatar Aug 05 '22 22:08 RossPatterson

~~I could use some advice here on proper testing.~~ (see update below) There are cases that are almost syntactically valid hexstrings or binarystrings, but only almost. Such cases usually have a different interpretation that is valid. Thus these bad strings parse successfully, but result in a different rule being applied. The current grammars_v4 test suite considers a test successful if it parses, and a failure if not. That won't work for these cases.

For example, 'hi mom!'x is obviously not a legal hexstring. But it is a legal concatenation-by-abuttal expressrion, and it parses correctly as one by the grammar.

I think the correct way to test this is to compare the output of grun -tree with a reference file, and to fail the test if they don't match byte-for-byte. But I'm not sure how to work this into the test suite.

Update: I figured out how the _grammar-test stuff works, and I now know how to verify the parse tree against an expected result.

RossPatterson avatar Aug 08 '22 00:08 RossPatterson

As noted in #2755 , GrammarTester won't even compile as currently set up (it's a trivial fix, but it means the tester isn't in active use). But it turns out I don't need to use GrammarTester to verify a parse tree - that's an optional function of antlr4/antlr4test-maven-plugin.

RossPatterson avatar Aug 16 '22 01:08 RossPatterson