gleam icon indicating copy to clipboard operation
gleam copied to clipboard

Support \x escape sequence in strings

Open tynanbe opened this issue 2 years ago • 1 comments

I haven't researched this much, but it looks like both Erlang and JavaScript support the "\x" escape sequence, e.g. "\x1b[36mCYAN STRING\x1b[0m".

Currently, Gleam only allows "\e", but Node.js doesn't seem to recognize it.

We should investigate further and document findings.

tynanbe avatar Feb 25 '22 21:02 tynanbe

Sounds good, but we'll need to ensure that Erlang and JS have exactly the same behaviour for the new escape code.

lpil avatar Feb 26 '22 10:02 lpil

I came across \u while playing with JavaScript as well (via gleam_community/ansi), more specifically \u001b

TanklesXL avatar Feb 08 '23 21:02 TanklesXL

What does that \u one do? Is it something that can't be expressed in Gleam today?

lpil avatar Feb 09 '23 19:02 lpil

What does that \u one do? Is it something that can't be expressed in Gleam today?

\u is used for representing Unicode characters in strings without using the character literal.

In addition, JavaScript allows using Unicode escape sequences in the form of \u0000 or \u{000000} in identifiers, which encode the same string value as the actual Unicode characters. For example, 你好 and \u4f60\u597d are the same identifiers:

MDN

It can be useful in situations where the Unicode literal is hard to distinguish visually (invisible character, visually similar to another unicode character):

let zero_width_literal = "​"

// Does not compile today:
let zero_width_unicode = "\u200b"

Certain editors like VS Code will also complain when you use certain Unicode literals:

Screenshot 2023-05-27 at 10 57 22 PM

maxdeviant avatar May 28 '23 02:05 maxdeviant

I'm gonna take a whack at this

bcpeinhardt avatar Aug 02 '23 19:08 bcpeinhardt

So what is the design here? Is \x redunant if we have \u?

lpil avatar Aug 03 '23 11:08 lpil

Hmmm. So for starters I don't thing erlang has \u syntax (correct me if I'm wrong people who actually speak erlang), so the impl there isn't straightforward. I think there's probably a solid argument to scrap all of it and favor bitstrings as the way to handle all of this. Then maybe a library could provide a set of named unicode characters to be used with <> Going to retract my whack for now, as I have other things that require whacking.

bcpeinhardt avatar Aug 03 '23 16:08 bcpeinhardt

How about this syntax?

\u{HH}       - unicode         (U+00HH)
\u{HHHH}     - unicode         (U+HHHH)
\u{HHHHHHHH} - unicode         (U+HHHHHHHH)

lpil avatar Nov 27 '23 12:11 lpil

I'm working on it right now. Doesn't seem very hard.

abs0luty avatar Dec 05 '23 07:12 abs0luty

Implemented it the way @lpil proposed.

abs0luty avatar Dec 05 '23 12:12 abs0luty