regexr icon indicating copy to clipboard operation
regexr copied to clipboard

Add Support for Features in the 2018 ECMAScript specification

Open Lonniebiz opened this issue 5 years ago • 15 comments

The 2018 ECMAScript specification includes four new regular expression features:

  1. the dotAll flag,
  2. named capture groups,
  3. unicode property escapes,
  4. and look-behind assertions

I'd love to see these features also supported in the internet's best RegEx tool.

Lonniebiz avatar Sep 26 '18 22:09 Lonniebiz

Here's a good article before diving into the specification:

Flags: https://flaviocopes.com/javascript-regular-expressions/#flags

Named Capture Groups: https://flaviocopes.com/javascript-regular-expressions/#named-capturing-groups

Unicode Property Escapes: https://flaviocopes.com/javascript-regular-expressions/#unicode-property-escapes

Look-behind Assertions: https://flaviocopes.com/javascript-regular-expressions/#lookbehinds-match-a-string-depending-on-what-precedes-it

Lonniebiz avatar Sep 26 '18 23:09 Lonniebiz

Any progress on this? Its been a year my friend. I still use your tool, but I'm having to use another tool some of the time (when I need those look-behind assertions).

Lonniebiz avatar Sep 30 '19 06:09 Lonniebiz

Most mayor browsers supports this now. I really wish these features were implemented.

Celsiusss avatar Oct 04 '19 18:10 Celsiusss

Thanks for the reminder. I'll take a look at this in the next little bit. It may be as easy as updating the language config, though I should probably add warning for the features since older browsers don't support them.

gskinner avatar Oct 04 '19 18:10 gskinner

I just pushed updates that add support for dotall and lookbehind. Property Escapes and named groups are going to take a bit more work, because I need to modify some parsing logic for them.

Once I do that and test, I'll push it live to the site.

Additional references: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Unicode_Property_Escapes

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges

Another quick note: It looks like unicode property escapes in JS support \P{etc}, but not \p{^etc} syntax, so I'll need to add another item to the flavor profile.

gskinner avatar Oct 07 '19 00:10 gskinner

Another quick update: pushed support for unicode property escapes. Still need to add named groups & update docs appropriately for all of these features.

Update: Named group support is in. Next up: docs.

Update: Docs are in. Will do some testing & I might tweak a couple other things, then publish these updates to the live site this weekend.

gskinner avatar Oct 11 '19 16:10 gskinner

A v3.7.0 build is available for testing on: https://beta.regexr.com

This includes support for named capture groups, unicode property escapes, lookbehind, and the dotall flag. Additionally, tooltips for tokens with warnings will now show both the token information and the warning.

I'm going to try to update the unicode script / code list for javascript so that it's accurate, since it doesn't follow the PCRE standards. I'm also going to address #320 before pushing this public.

Please take a look and let me know if you run into any issues. Note that your accounts won't work properly in the beta environment, so don't do any real work there.

gskinner avatar Oct 15 '19 16:10 gskinner

Ok. I've pushed this support live, but I'm going to leave this open because I still need to track down a list of valid unicode property values for javascript and add them in. For now it's still using the PCRE list, which is better than nothing, but isn't ideal.

gskinner avatar Oct 16 '19 04:10 gskinner

Are you referring to what is called Unicode Property Escapes by MDN?

Is this what you're needing?:

  • https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt
  • https://www.unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt
  • https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt

Parent Folder of above files: https://www.unicode.org/Public/UCD/latest/ucd/

Here's how I reached the link above. I found this article in a Google Search: https://2ality.com/2017/07/regexp-unicode-property-escapes.html#unicode-character-properties

That lead me to this PDF document: http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf#page=25

On the 25th page of that document, which is labeled page 96 it says:

A list of the values associated with encoded character properties in the Unicode Standard can be found in PropertyValueAliases.txt in the Unicode Character Database.

I believe the quote above is referring to this document: https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt

Lonniebiz avatar Oct 16 '19 17:10 Lonniebiz

Thanks for doing that legwork! I'll try to take a look this week, and incorporate it if testing shows it to be the correct listing.

gskinner avatar Oct 17 '19 18:10 gskinner

Nevermind; Firefox still does not support named capture groups:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

Might be good to add a warning to the 5% of we devs who still use Firefox....

tomByrer avatar Dec 07 '19 14:12 tomByrer

Good News: When Firefox 78 is released at the end of this month, it will finally support the new regular expression features that you've added to your tool. @gskinner

Lonniebiz avatar Jun 14 '20 10:06 Lonniebiz

Today, I installed Firefox version 78 and confirm that these 2018 RegEx spec features are finally live in Firefox as well as Chromium/Chrome! @gskinner

Lonniebiz avatar Jul 01 '20 11:07 Lonniebiz

Thanks @Lonniebiz it does seem to work in Firefox!

tomByrer avatar Jul 11 '20 00:07 tomByrer

@tomByrer That link has one of the coolest fonts I've seen on the web and I love the style of those form elements. Very nice!

Lonniebiz avatar Jul 11 '20 16:07 Lonniebiz