grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

pictureString in Cobol85 grammar

Open kevinlano opened this issue 3 years ago • 5 comments

This gives an incorrect parse:

java org.antlr.v4.gui.TestRig Cobol85 pictureString -tree
(pictureString (pictureChars (integerLiteral 9)) (pictureChars () (pictureChars (integerLiteral 18)) (pictureChars )))

Should be

(pictureString (pictureChars (integerLiteral 9)) (pictureCardinality ( (integerLiteral 18) ) )  )

kevinlano avatar Oct 31 '22 18:10 kevinlano

pictureString
: ( pictureChars+ pictureCardinality? )+
;
pictureChars
: DOLLARCHAR
| IDENTIFIER
| NUMERICLITERAL
| SLASHCHAR
| COMMACHAR
| DOT
| COLONCHAR
| ASTERISKCHAR
| DOUBLEASTERISKCHAR
| LPARENCHAR
| RPARENCHAR
| PLUSCHAR
| MINUSCHAR
| LESSTHANCHAR
| MORETHANCHAR
| integerLiteral
;
pictureCardinality
: LPARENCHAR integerLiteral RPARENCHAR
;

There is an ambiguity in the rule pictureString. pictureChars covers all the elements in pictureCardinality and the parser chooses pictureChars over the latter. I'm not familiar with COBOL85, does the spec allow unmatched parentheses in pictureString?

msagca avatar Oct 31 '22 19:10 msagca

( and ) cannot normally occur in a picture string in COBOL 85, apart from the delimiters of the cardinality part. The only exception is immediately following an I character:

"Each I in the character string indicates that the nonblank character immediately following it is treated as a simple insertion character." (COBOL 85 manual).

kevinlano avatar Nov 03 '22 11:11 kevinlano

@kevinlano,

In the manual, on page 157, it says:

The allowable PICTURE clause symbols are A, B, I, N, P, S, V, X, Z, 0, 1, 9, slant (/),
comma (,), period (.), plus sign (+), minus sign (-), asterisk (*), currency symbol
(usually $), CR, and DB. Refer to the paragraphs under “Symbols” in this section for
information on each of these symbols.

and on page 164:

The symbols A, B, P, X, Z, 9, 0 (zero), asterisk (*), slant (/), comma (,), plus sign (+),
minus sign (-), or currency symbol ($) can appear more than once in a given PICTURE
clause. You can specify a number of consecutive occurrences of a symbol by using an
unsigned integer enclosed in parentheses after the symbol. For example, X(8)
indicates eight alphanumeric characters.

According to these, LPARENCHAR and RPARENCHAR do not belong in pictureChars; so, you can simply remove them.

msagca avatar Nov 03 '22 12:11 msagca

@kevinlano,

I think I misunderstood the following description (on page 158).

Symbol Function
I Each I in the character string indicates that the nonblank character immediately following it is treated as a simple insertion character. Specifying the character I as the currency symbol overrides its use to indicate simple insertion characters. The I itself is not counted in the size of the item, but the single, nonblank character following it is counted in the size of the item. The 30-character limit for the size of a PICTURE string includes both the I symbol and the character that follows it.

I thought a nonblank character following I had to be one of the given symbols but that doesn't make sense. However, I can't find in the manual what characters are allowed after I and I haven't seen a single example usage of this.

If I can be followed by ( or ), then we cannot get rid of these characters in pictureChars. I think the most elegant solution would be to split the combined grammar (see trsplit) and handle these cases in lexer modes.

msagca avatar Nov 03 '22 14:11 msagca

Yes, lexer rules are probably the way to go, indeed even a specific mode for the pictureChars contents. Apart from the above issue, the individual characters should be treated as separate tokens. At present the pictureChars are tokenised as identifiers, numbers or general Cobol tokens. Eg> 99,99 as a picture string is parsed as (pictureString (pictureChars 99,99)) Likewise for 99V99. But the individual characters are meaningful, so I would expect (pictureString (pictureChars 9) (pictureChars 9) (pictureChars ,) ....) etc

kevinlano avatar Nov 05 '22 12:11 kevinlano