ecma262 icon indicating copy to clipboard operation
ecma262 copied to clipboard

Normative: Add RegExp `v` flag with set notation and properties of strings

Open mathiasbynens opened this issue 4 years ago • 18 comments
trafficstars

Proposal repo: https://github.com/tc39/proposal-regexp-set-notation

Preview link for the spec changes with inline diffs: https://arai-a.github.io/ecma262-compare/?pr=2418

mathiasbynens avatar May 28 '21 11:05 mathiasbynens

This PR still has a bunch of "FYI" comments in it, which I assume you don't intend. Is it actually ready for review?

bakkot avatar Nov 30 '21 00:11 bakkot

@FrankYFTang we don't use separated html files; can you inline the table?

ljharb avatar Dec 01 '21 22:12 ljharb

This PR still has a bunch of "FYI" comments in it, which I assume you don't intend. Is it actually ready for review?

We thought that it's ok for stage 3 to still have editorial notes and explainers. We intend to eventually turn many of them into permanent NOTEs or whatever is a reasonable format, based on feedback. Ok?

markusicu avatar Dec 02 '21 00:12 markusicu

Sure, that's ok. So do you just want the normative parts reviewed at the moment? Also, do you intend to address the TODOs before review? Some of them look like they'd be normative.

bakkot avatar Dec 02 '21 00:12 bakkot

So do you just want the normative parts reviewed at the moment? Also, do you intend to address the TODOs before review? Some of them look like they'd be normative.

I believe that we covered nearly everything, but I plan to go through it myself once more with a fine-tooth comb.

I hope that this will not block stage 3 reviewers from reviewing and providing feedback.

markusicu avatar Dec 02 '21 00:12 markusicu

(Generally those things are fine in the proposal, but don't go in the spec PR)

ljharb avatar Dec 02 '21 07:12 ljharb

@FrankYFTang we don't use separated html files; can you inline the table?

See https://github.com/tc39/ecma262/blob/main/table-binary-unicode-properties.html https://github.com/tc39/ecma262/blob/main/table-nonbinary-unicode-properties.html https://github.com/tc39/ecma262/blob/main/table-unicode-general-category-values.html https://github.com/tc39/ecma262/blob/main/table-unicode-script-values.html

Notice these table are separated html files because for every new verison of Unicode standard @mathiasbynens update them. I am not sure he use some tool to generate them or by hand in the past. I just follow that

See https://github.com/tc39/ecma262/pull/1041/ https://github.com/tc39/ecma262/pull/1218

FrankYFTang avatar Dec 02 '21 10:12 FrankYFTang

What is \q? The proposal readme doesn't say, I don't recall it being presented, and neither the google doc nor this PR give it any semantics.

bakkot avatar Dec 07 '21 03:12 bakkot

What is \q? The proposal readme doesn't say, I don't recall it being presented, and neither the google doc nor this PR give it any semantics.

We flip-flopped on the string literal syntax, and went back from (string|literal) to \q{string|literal} which is what the Unicode regex spec (UTS 18) recommends. See https://github.com/tc39/proposal-regexp-set-notation/issues/33#issuecomment-931753338

This is one of a few things that we will note specially in next week's meeting.

We did edit the readme to change example string literals to this syntax.

markusicu avatar Dec 07 '21 04:12 markusicu

(Note that we're not yet asking for Stage 3 advancement, although we are asking Stage 3 reviewers to start reviewing: https://github.com/tc39/agendas/pull/1093)

mathiasbynens avatar Dec 07 '21 08:12 mathiasbynens

Incidentally this will need a rebase, mostly due to https://github.com/tc39/ecma262/pull/2531. I can take care of that for you, if you'd like. Sorry about the conflict, although I do think #2531 will make this easier to follow.

bakkot avatar Dec 09 '21 19:12 bakkot

Incidentally this will need a rebase, mostly due to #2531. I can take care of that for you, if you'd like.

That'd be lovely, if you don't mind! Thanks.

mathiasbynens avatar Dec 09 '21 22:12 mathiasbynens

OK, rebased and adopted the conventions of https://github.com/tc39/ecma262/pull/2531. Not certain I did it perfectly, but it seems to match up pretty well.

bakkot avatar Dec 11 '21 07:12 bakkot

Can the new table be dropped in light of https://github.com/tc39/ecma262/pull/2649?

bakkot avatar Feb 03 '22 00:02 bakkot

Can the new table be dropped in light of #2649?

No. That PR replaces tables of Unicode property values with a reference to where Unicode defines those. That works because ES wants to support all of the values for certain properties.

ES still needs its own tables of which properties it wants to support. It could refer to Unicode for multiple aliases per property, but ES needs to at least list each property once. There are many Unicode properties that ES regex does not support.

markusicu avatar Feb 10 '22 22:02 markusicu

AFAICT we have taken care of all of the feedback, except for editorial issues (naming, strings vs. sequences) that I hope won't block moving to stage 3. PTAL.

markusicu avatar Feb 11 '22 16:02 markusicu

@waldemarhorwat completed his Stage 3 review and confirmed that we've addressed his review feedback here: https://github.com/tc39/proposal-regexp-set-notation/issues/54

mathiasbynens avatar Mar 26 '22 22:03 mathiasbynens

I recommend that you follow #2716 (and comment on it as appropriate), which proposes to restructure regular expression processing for more explicit data propagation. It's editorial, but similar restructuring will be required here if it lands first.

gibson042 avatar Mar 31 '22 02:03 gibson042

AFAICT, remaining TODOs for this patch are:

  • [x] Follow the restructuring in #2716 and apply it to this patch.
  • [x] Rebase. (This is gonna be painful...)
  • [x] Remove <p style="background: yellow"> reviewer notes
  • [x] Address final normative TODO: https://github.com/mathiasbynens/ecma262/pull/18

mathiasbynens avatar Apr 14 '23 18:04 mathiasbynens

I’ve attempted a clean rebase and have force-pushed it here. I’ve also removed the background: yellow reviewer notes now.

In case I did something horribly wrong, here’s a backup of the previous state: https://github.com/mathiasbynens/ecma262/tree/regexp-v-flag-backup-2023-04-21

mathiasbynens avatar Apr 21 '23 08:04 mathiasbynens

Rather than making suggested changes here, I made them as a PR against regex-v-flag branch: https://github.com/mathiasbynens/ecma262/pull/19

jmdyck avatar Apr 22 '23 03:04 jmdyck

And again, after further analysis: https://github.com/mathiasbynens/ecma262/pull/20

jmdyck avatar Apr 22 '23 19:04 jmdyck

Also, in CompileAtom, there's a lot of overlap between the semantics for Atom :: CharacterClass and AtomEscape :: CharacterClassEscape. It might be good to factor that out. Not sure what it would be called though.

jmdyck avatar Apr 22 '23 19:04 jmdyck

This proposal achieved Stage 4 at last week's TC39 meeting. The only part that's missing is to get this PR across the finish line. :)

What are the remaining blockers for a final review of this PR?

There's this comment by @jmdyck, but I can't tell if it's a hard requirement or a potential post-merge refactoring/improvement:

Also, in CompileAtom, there's a lot of overlap between the semantics for Atom :: CharacterClass and AtomEscape :: CharacterClassEscape. It might be good to factor that out. Not sure what it would be called though.

mathiasbynens avatar May 24 '23 15:05 mathiasbynens

(i clicked the "rebase branch" button in the UI)

ljharb avatar May 24 '23 15:05 ljharb

I’ve rebased this and the esmeta check now passes (thanks to #3078).

Are there any other blockers preventing this PR from being accepted?

mathiasbynens avatar Jun 05 '23 07:06 mathiasbynens

@mathiasbynens No, just waiting on review from one more editor.

michaelficarra avatar Jun 06 '23 00:06 michaelficarra

Thanks everyone!

mathiasbynens avatar Jun 15 '23 21:06 mathiasbynens