Hex character escapes
Right now, looks like we don't support escapes of the form "\x12", "\u1234", and "\U12345678"; these probably aren't hard to add in unescapeString and escapeString, but some thought should be put into what our Unicode guarantees for strings actually are; should we allow the string "\ud800", for example? What should we do when trying to write that out to UTF-8, if so?
I guess this comes down to, what is a string?
- A sequence of 8-bit unsigned integers
- A sequence of 16-bit unsigned integers
- A sequence of Unicode code points
- A sequence of Unicode scalar values
- A sequence of Unicode grapheme clusters
- Something else?
https://simonsapin.github.io/wtf-8/#motivation for some background
My first thought, without too much careful consideration, was "just do what Java does".
But apparently Java converts unicode escape sequences anywhere in a source file to the equivalent characters, before parsing? https://javajee.com/unicode-escapes-in-java
That seems a bit strange (and would cause issues with locations) - and I don't see a real advantage to doing it this way anyway? So IDK.