eslint-plugin-unicorn
eslint-plugin-unicorn copied to clipboard
Rule proposal: `prefer-regexp-code-point-escape`
Unicode code point escapes are new in ES6. They support more bits than older escapes and it's better to always use them for consistency, even when they're not required.
- https://exploringjs.com/es6/ch_unicode.html#sec_escape-sequences
- https://mathiasbynens.be/notes/javascript-escapes#unicode-code-point
This overlaps with the no-hex-escapes rule. Not sure whether we should deprecate that one. It's plausible that someone wants to prevent Hex escapes, but not prefer code point escapes. Opinions welcome.
Inspired by https://github.com/eslint/eslint/issues/12488.
Fail
const foo = '\123'; // Octal
const foo = '\cA'; // Control escape sequence
const foo = '\x7A'; // Hex
const foo = '\u2661'; // Unicode escape sequence
const foo = '\uD83D\uDCA9'; // Unicode surrogate pair
Pass
const foo = '\u{7A}';
const foo = '\u{1F4A9}';
I was going to make a proposal on this the other day, I was thinking merge no-hex-escapes into the new one.
And I prefer const foo = '\u007A'; over const foo = '\u{7A}';
About the name, we already have better-regex, let's use better-string?
Nice idea, I would like this rule. I think it makes sense to deprecate no-hex-escapes. I don't like the better-string name though, I think it's too vague. What about better-string-escapes?
And I prefer const foo = '\u007A'; over const foo = '\u{7A}';
Did you see my arguments for why \u{7A} is better? It's shorter and it lets you use the same syntax always.
It also makes escapes stand out more because of the braces, which makes strings with a lot of escapes more readable.
About the name, we already have better-regex, let's use better-string?
I forgot to mention, it should apply to regexes too. I think it's better to have an explicit name for exactly what it does.
This overlaps with the no-hex-escapes rule. Not sure whether we should deprecate that one. It's plausible that someone wants to prevent Hex escapes, but not prefer code point escapes. Opinions welcome.
I think this rule should also handle Hex escapes.
This is now accepted.
And I prefer
const foo = '\u007A';overconst foo = '\u{7A}';Did you see my arguments for why
\u{7A}is better? It's shorter and it lets you use the same syntax always.It also makes escapes stand out more because of the braces, which makes strings with a lot of escapes more readable.
@fisker @sindresorhus
I don't really agree because Unicode is notated using U+0000 with at least 4 digits.
[...] notated according to the standard as U+0000–U+10FFFF
Look at the notation used on various RFC and Wikipedia articles
- https://datatracker.ietf.org/doc/html/rfc8266
- https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology
- https://en.wikipedia.org/wiki/List_of_Unicode_characters
- https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)
🚨 I would also enforce \u{…} wrapper as "\u{007A}A" is MUCH more readable than "\u007AA" !
Fail
const foo = "\u7A";
const foo = "\u007A";
Pass
const foo = "\u{007A}";
const foo = "\u{10FFFD}";
Edit: Also for consistency with RegExp notation that requires 4 digits:
const regex = /\u007A/; ✅
const regex = /\u7A/; ❌
Sure, PR welcome.
Sure, PR welcome.
Hum.. I think I'll give it a try
About the name, we already have better-regex, let's use better-string?
I forgot to mention, it should apply to regexes too. I think it's better to have an explicit name for exactly what it does.
About the rule name what about prefer-string-unicode-wrapper?
RegEx only support Unicode wrapper with u or v flag:
/\u0061/.test("a"); // true
/\u{0061}/.test("a"); // false!
/\u{0061}/u.test("a"); // true
/\u{0061}/v.test("a"); // true
Note also that
/\u{61}/u.test("a"); // true
/\u{61}/v.test("a"); // true
/\u61/.test("a"); // false!
/\u61/u.test("a"); // Uncaught SyntaxError: Invalid regular expression: /\u61/u: Invalid Unicode escape
/\u61/v.test("a"); // Uncaught SyntaxError: Invalid regular expression: /\u61/u: Invalid Unicode escape
See also https://eslint.org/docs/latest/rules/require-unicode-regexp rule that enforces either u or v flag to be used on RegExp.
If we extend the rule to RegExp we also have to make sure at least u flag is used during autofix
/\u0061/ --> /\u{0061}/u
@sindresorhus should we only focus on string? Or make a single Unicode wrapper enforcement rule for both strings and regexes? That will add complexity to the rule.
For both strings and regexes: prefer-unicode-wrapper ?
If we extend the rule to RegExp we also have to make sure at least u flag is used during autofix
👍
should we only focus on string? Or make a single Unicode wrapper enforcement rule for both strings and regexes? That will make the rule more complex to code.
Both
For both strings and regexes: prefer-unicode-wrapper ?
Maybe prefer-unicode-code-point-escapes? To be explicit. https://mathiasbynens.be/notes/javascript-escapes#unicode-code-point