silver icon indicating copy to clipboard operation
silver copied to clipboard

Hex character escapes

Open remexre opened this issue 4 years ago • 2 comments

Right now, looks like we don't support escapes of the form "\x12", "\u1234", and "\U12345678"; these probably aren't hard to add in unescapeString and escapeString, but some thought should be put into what our Unicode guarantees for strings actually are; should we allow the string "\ud800", for example? What should we do when trying to write that out to UTF-8, if so?

I guess this comes down to, what is a string?

  • A sequence of 8-bit unsigned integers
  • A sequence of 16-bit unsigned integers
  • A sequence of Unicode code points
  • A sequence of Unicode scalar values
  • A sequence of Unicode grapheme clusters
  • Something else?

remexre avatar Apr 29 '21 22:04 remexre

https://simonsapin.github.io/wtf-8/#motivation for some background

remexre avatar Apr 29 '21 22:04 remexre

My first thought, without too much careful consideration, was "just do what Java does".

But apparently Java converts unicode escape sequences anywhere in a source file to the equivalent characters, before parsing? https://javajee.com/unicode-escapes-in-java

That seems a bit strange (and would cause issues with locations) - and I don't see a real advantage to doing it this way anyway? So IDK.

krame505 avatar Apr 30 '21 01:04 krame505