Hex literals can be invalid UTF8
The compiler accepts hex literals 'FFFFFF'x and these are store as utf8 strings.
Unfortunately if the hex literal is not valid utf8 then later the character rxas instructions fail.
Actions
- The compiler needs to validate strings are well formed utf8
- Level b needs to support a binary datatype
- Level c will need an approach to distinguish (transparently to the user) between binary and utf8 data
Introducing a dedicated binary type feels like overkill, given how limited its actual use case appears to be.
If we restrict the UTF-8 string to just the characters 0–9 and a–f, then each character is a single-byte UTF-8 code point (i.e., standard ASCII). We could define a new instruction that converts such a string into a plain char array. If the string contains invalid characters, we could raise an error — potentially even at compile time. From my perspective, adding a full binary data type might introduce more complexity than we need at this stage.
I am not against a BYTE type, which I am using regularly in NetRexx (Java's byte type, that is)