TypeCobol icon indicating copy to clipboard operation
TypeCobol copied to clipboard

Incorrect encoding for alphanumeric literals using hexadecimal notation

Open fm-117 opened this issue 7 months ago • 3 comments

What is the problem ?

The scanner uses the MulitlineScanState.EncodingForAlphanumericLiterals property to get the string value of alphanumeric literals described using the hexadecimal notation. However this property gets its value from the encoding of the source file which is a different notion.

Here are the IBM specs for alphanumeric literals written in hex:

Hexadecimal digits are characters in the range '0' to '9', 'a' to 'f', and 'A' to 'F', inclusive. Two hexadecimal digits represent one character in a single-byte character set (EBCDIC or ASCII). Four hexadecimal digits represent one character in a DBCS character set. A string of EBCDIC DBCS characters represented in hexadecimal notation must be preceded by the hexadecimal representation of a shift-out control character (X'0E') and followed by the hexadecimal representation of a shift-in control character (X'0F'). An even number of hexadecimal digits must be specified. The maximum length of a hexadecimal literal is 320 hexadecimal digits.

The continuation rules are the same as those for any alphanumeric literal. The opening delimiter (X" or X') cannot be split across lines.

The DBCS compiler option has no effect on the processing of hexadecimal notation of alphanumeric literals.

How to fix ?

  • The clients should be able to specify both encoding of their sources and encoding to use to read hex literals so we need a new option
  • A sensible default value should be used
    • EBCDIC 1147 is the default in our company but it may not be the most widely used encoding
    • Whichever default value is used, this will result in a breaking change

fm-117 avatar Jul 04 '24 08:07 fm-117