CREXX Hex literals can be invalid UTF8

The compiler accepts hex literals 'FFFFFF'x and these are store as utf8 strings.

Unfortunately if the hex literal is not valid utf8 then later the character rxas instructions fail.

Actions

The compiler needs to validate strings are well formed utf8
Level b needs to support a binary datatype
Level c will need an approach to distinguish (transparently to the user) between binary and utf8 data

Jul 10 '25 12:07 adesutherland

Introducing a dedicated binary type feels like overkill, given how limited its actual use case appears to be.

If we restrict the UTF-8 string to just the characters 0–9 and a–f, then each character is a single-byte UTF-8 code point (i.e., standard ASCII). We could define a new instruction that converts such a string into a plain char array. If the string contains invalid characters, we could raise an error — potentially even at compile time. From my perspective, adding a full binary data type might introduce more complexity than we need at this stage.

Jul 10 '25 17:07 Peter-Jacob

I am not against a BYTE type, which I am using regularly in NetRexx (Java's byte type, that is)

Jul 16 '25 10:07 rvjansen