eslint-plugin-unicorn icon indicating copy to clipboard operation
eslint-plugin-unicorn copied to clipboard

Rule proposal: `prefer-regexp-code-point-escape`

Open sindresorhus opened this issue 4 years ago • 11 comments

Unicode code point escapes are new in ES6. They support more bits than older escapes and it's better to always use them for consistency, even when they're not required.

  • https://exploringjs.com/es6/ch_unicode.html#sec_escape-sequences
  • https://mathiasbynens.be/notes/javascript-escapes#unicode-code-point

This overlaps with the no-hex-escapes rule. Not sure whether we should deprecate that one. It's plausible that someone wants to prevent Hex escapes, but not prefer code point escapes. Opinions welcome.

Inspired by https://github.com/eslint/eslint/issues/12488.

Fail

const foo = '\123'; // Octal
const foo = '\cA'; // Control escape sequence
const foo = '\x7A'; // Hex
const foo = '\u2661'; // Unicode escape sequence
const foo = '\uD83D\uDCA9'; // Unicode surrogate pair

Pass

const foo = '\u{7A}';
const foo = '\u{1F4A9}';

sindresorhus avatar Jan 02 '21 10:01 sindresorhus

I was going to make a proposal on this the other day, I was thinking merge no-hex-escapes into the new one.

And I prefer const foo = '\u007A'; over const foo = '\u{7A}';

fisker avatar Jan 02 '21 14:01 fisker

About the name, we already have better-regex, let's use better-string?

fisker avatar Jan 02 '21 15:01 fisker

Nice idea, I would like this rule. I think it makes sense to deprecate no-hex-escapes. I don't like the better-string name though, I think it's too vague. What about better-string-escapes?

papb avatar Jan 02 '21 16:01 papb

And I prefer const foo = '\u007A'; over const foo = '\u{7A}';

Did you see my arguments for why \u{7A} is better? It's shorter and it lets you use the same syntax always.

It also makes escapes stand out more because of the braces, which makes strings with a lot of escapes more readable.

sindresorhus avatar Jan 03 '21 10:01 sindresorhus

About the name, we already have better-regex, let's use better-string?

I forgot to mention, it should apply to regexes too. I think it's better to have an explicit name for exactly what it does.

sindresorhus avatar Jan 03 '21 10:01 sindresorhus

This overlaps with the no-hex-escapes rule. Not sure whether we should deprecate that one. It's plausible that someone wants to prevent Hex escapes, but not prefer code point escapes. Opinions welcome.

I think this rule should also handle Hex escapes.

sindresorhus avatar Feb 09 '21 17:02 sindresorhus

This is now accepted.

sindresorhus avatar Feb 09 '21 17:02 sindresorhus

And I prefer const foo = '\u007A'; over const foo = '\u{7A}';

Did you see my arguments for why \u{7A} is better? It's shorter and it lets you use the same syntax always.

It also makes escapes stand out more because of the braces, which makes strings with a lot of escapes more readable.

@fisker @sindresorhus

I don't really agree because Unicode is notated using U+0000 with at least 4 digits.

[...] notated according to the standard as U+0000–U+10FFFF

Look at the notation used on various RFC and Wikipedia articles

  • https://datatracker.ietf.org/doc/html/rfc8266
  • https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology
  • https://en.wikipedia.org/wiki/List_of_Unicode_characters
  • https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)

🚨 I would also enforce \u{…} wrapper as "\u{007A}A" is MUCH more readable than "\u007AA" !

Fail

const foo = "\u7A";
const foo = "\u007A";

Pass

const foo = "\u{007A}";
const foo = "\u{10FFFD}";

Edit: Also for consistency with RegExp notation that requires 4 digits:

const regex = /\u007A/;   ✅ 
const regex = /\u7A/;     ❌  

yvele avatar Jul 11 '24 15:07 yvele

Sure, PR welcome.

fisker avatar Jul 11 '24 15:07 fisker

Sure, PR welcome.

Hum.. I think I'll give it a try

About the name, we already have better-regex, let's use better-string?

I forgot to mention, it should apply to regexes too. I think it's better to have an explicit name for exactly what it does.

About the rule name what about prefer-string-unicode-wrapper?

RegEx only support Unicode wrapper with u or v flag:

/\u0061/.test("a");    // true
/\u{0061}/.test("a");  // false!
/\u{0061}/u.test("a"); // true
/\u{0061}/v.test("a"); // true

Note also that

/\u{61}/u.test("a"); // true
/\u{61}/v.test("a"); // true
/\u61/.test("a");    // false!
/\u61/u.test("a");   // Uncaught SyntaxError: Invalid regular expression: /\u61/u: Invalid Unicode escape
/\u61/v.test("a");   // Uncaught SyntaxError: Invalid regular expression: /\u61/u: Invalid Unicode escape

See also https://eslint.org/docs/latest/rules/require-unicode-regexp rule that enforces either u or v flag to be used on RegExp.

If we extend the rule to RegExp we also have to make sure at least u flag is used during autofix

/\u0061/ --> /\u{0061}/u

@sindresorhus should we only focus on string? Or make a single Unicode wrapper enforcement rule for both strings and regexes? That will add complexity to the rule.

For both strings and regexes: prefer-unicode-wrapper ?

yvele avatar Jul 12 '24 11:07 yvele

If we extend the rule to RegExp we also have to make sure at least u flag is used during autofix

👍

should we only focus on string? Or make a single Unicode wrapper enforcement rule for both strings and regexes? That will make the rule more complex to code.

Both

For both strings and regexes: prefer-unicode-wrapper ?

Maybe prefer-unicode-code-point-escapes? To be explicit. https://mathiasbynens.be/notes/javascript-escapes#unicode-code-point

sindresorhus avatar Jul 14 '24 00:07 sindresorhus