ecma262
ecma262 copied to clipboard
Normative: Add RegExp `v` flag with set notation and properties of strings
Proposal repo: https://github.com/tc39/proposal-regexp-set-notation
Preview link for the spec changes with inline diffs: https://arai-a.github.io/ecma262-compare/?pr=2418
This PR still has a bunch of "FYI" comments in it, which I assume you don't intend. Is it actually ready for review?
@FrankYFTang we don't use separated html files; can you inline the table?
This PR still has a bunch of "FYI" comments in it, which I assume you don't intend. Is it actually ready for review?
We thought that it's ok for stage 3 to still have editorial notes and explainers. We intend to eventually turn many of them into permanent NOTEs or whatever is a reasonable format, based on feedback. Ok?
Sure, that's ok. So do you just want the normative parts reviewed at the moment? Also, do you intend to address the TODOs before review? Some of them look like they'd be normative.
So do you just want the normative parts reviewed at the moment? Also, do you intend to address the TODOs before review? Some of them look like they'd be normative.
I believe that we covered nearly everything, but I plan to go through it myself once more with a fine-tooth comb.
I hope that this will not block stage 3 reviewers from reviewing and providing feedback.
(Generally those things are fine in the proposal, but don't go in the spec PR)
@FrankYFTang we don't use separated html files; can you inline the table?
See https://github.com/tc39/ecma262/blob/main/table-binary-unicode-properties.html https://github.com/tc39/ecma262/blob/main/table-nonbinary-unicode-properties.html https://github.com/tc39/ecma262/blob/main/table-unicode-general-category-values.html https://github.com/tc39/ecma262/blob/main/table-unicode-script-values.html
Notice these table are separated html files because for every new verison of Unicode standard @mathiasbynens update them. I am not sure he use some tool to generate them or by hand in the past. I just follow that
See https://github.com/tc39/ecma262/pull/1041/ https://github.com/tc39/ecma262/pull/1218
What is \q? The proposal readme doesn't say, I don't recall it being presented, and neither the google doc nor this PR give it any semantics.
What is
\q? The proposal readme doesn't say, I don't recall it being presented, and neither the google doc nor this PR give it any semantics.
We flip-flopped on the string literal syntax, and went back from (string|literal) to \q{string|literal} which is what the Unicode regex spec (UTS 18) recommends. See https://github.com/tc39/proposal-regexp-set-notation/issues/33#issuecomment-931753338
This is one of a few things that we will note specially in next week's meeting.
We did edit the readme to change example string literals to this syntax.
(Note that we're not yet asking for Stage 3 advancement, although we are asking Stage 3 reviewers to start reviewing: https://github.com/tc39/agendas/pull/1093)
Incidentally this will need a rebase, mostly due to https://github.com/tc39/ecma262/pull/2531. I can take care of that for you, if you'd like. Sorry about the conflict, although I do think #2531 will make this easier to follow.
Incidentally this will need a rebase, mostly due to #2531. I can take care of that for you, if you'd like.
That'd be lovely, if you don't mind! Thanks.
OK, rebased and adopted the conventions of https://github.com/tc39/ecma262/pull/2531. Not certain I did it perfectly, but it seems to match up pretty well.
Can the new table be dropped in light of https://github.com/tc39/ecma262/pull/2649?
Can the new table be dropped in light of #2649?
No. That PR replaces tables of Unicode property values with a reference to where Unicode defines those. That works because ES wants to support all of the values for certain properties.
ES still needs its own tables of which properties it wants to support. It could refer to Unicode for multiple aliases per property, but ES needs to at least list each property once. There are many Unicode properties that ES regex does not support.
AFAICT we have taken care of all of the feedback, except for editorial issues (naming, strings vs. sequences) that I hope won't block moving to stage 3. PTAL.
@waldemarhorwat completed his Stage 3 review and confirmed that we've addressed his review feedback here: https://github.com/tc39/proposal-regexp-set-notation/issues/54
I recommend that you follow #2716 (and comment on it as appropriate), which proposes to restructure regular expression processing for more explicit data propagation. It's editorial, but similar restructuring will be required here if it lands first.
AFAICT, remaining TODOs for this patch are:
- [x] Follow the restructuring in #2716 and apply it to this patch.
- [x] Rebase. (This is gonna be painful...)
- [x] Remove
<p style="background: yellow">reviewer notes - [x] Address final normative TODO: https://github.com/mathiasbynens/ecma262/pull/18
I’ve attempted a clean rebase and have force-pushed it here. I’ve also removed the background: yellow reviewer notes now.
In case I did something horribly wrong, here’s a backup of the previous state: https://github.com/mathiasbynens/ecma262/tree/regexp-v-flag-backup-2023-04-21
Rather than making suggested changes here, I made them as a PR against regex-v-flag branch: https://github.com/mathiasbynens/ecma262/pull/19
And again, after further analysis: https://github.com/mathiasbynens/ecma262/pull/20
Also, in CompileAtom, there's a lot of overlap between the semantics for Atom :: CharacterClass and AtomEscape :: CharacterClassEscape. It might be good to factor that out. Not sure what it would be called though.
This proposal achieved Stage 4 at last week's TC39 meeting. The only part that's missing is to get this PR across the finish line. :)
What are the remaining blockers for a final review of this PR?
There's this comment by @jmdyck, but I can't tell if it's a hard requirement or a potential post-merge refactoring/improvement:
Also, in CompileAtom, there's a lot of overlap between the semantics for
Atom :: CharacterClassandAtomEscape :: CharacterClassEscape. It might be good to factor that out. Not sure what it would be called though.
(i clicked the "rebase branch" button in the UI)
I’ve rebased this and the esmeta check now passes (thanks to #3078).
Are there any other blockers preventing this PR from being accepted?
@mathiasbynens No, just waiting on review from one more editor.
Thanks everyone!