Cesium
Cesium copied to clipboard
Character unescaping improvements
Some issues with the current code in Cesium.CodeGen.Ir.Expressions.Constants.CharConstant.UnescapeCharacter and Cesium.Parser.TokenExtensions.UnwrapStringLiteral:
- [ ] There are two of them, with different implementations. There should be only one.
- [ ]
UnescapeCharacterdoesn't support\uand\Uakauniversal-character-namefrom the standard. - [ ]
UnescapeCharacteralso has a bug in handling octal and hex sequences: both are considered to only have two digits, with special treatment of\0. While the standard defines octal sequences to be either one, two or three characters long, while the hex escapes are of arbitrary length. - [ ]
\0should not be a special case in either of the methods; it is just an octal number. - [ ]
UnwrapStringLiteralalso seems to treat octal sequences weirdly: I only see support for octal numbers starting from0which is not correct (UnescapeCharacterhandles these better). - [ ] Normal compiler behavior is to report a warning on an invalid sequence (e.g.
\m) and treat it as the character itself. We don't do this: we either silently accept or break on such sequences.
See also #295.