TypeScript
TypeScript copied to clipboard
Type Inference for Regular Expressions
This PR adds type inference support to regular expressions for accurate and fine type checking. For example, the type of the following loosey[^1] regex:
[^1]: It only matches a limited subset of the Temporal DateTime format and is for demonstration purpose only.
const dateTimeRegex = /^(?<date>(?<year>\d{4}|(?!-000000)[+-]\d{6})-(?!(?:0[2469]|11)-31|02-30)(?<month>0[1-9]|1[0-2])-(?<day>0[1-9]|[12]\d|3[01]))[ T](?<time>(?<hour>[01]\d|2[0-3]):(?<minute>[0-5]\d):(?<second>[0-5]\d|60))$/i;
is inferred as (prettified manually):
RegExp<
/*CapturingGroups*/ [
| `${string}-01-01 ${string}` | `${string}-01-01T${string}` | `${string}-01-01t${string}`
| `${string}-01-02 ${string}` | `${string}-01-02T${string}` | `${string}-01-02t${string}`
| `${string}-01-03 ${string}` | `${string}-01-03T${string}` | `${string}-01-03t${string}`
| ...
| `${string}-12-29 ${string}` | `${string}-12-29T${string}` | `${string}-12-29t${string}`
| `${string}-12-30 ${string}` | `${string}-12-30T${string}` | `${string}-12-30t${string}`
| `${string}-12-31 ${string}` | `${string}-12-31T${string}` | `${string}-12-31t${string}`,
`${string}-01-01` | `${string}-01-02` | `${string}-01-03` | ... | `${string}-12-29` | `${string}-12-30` | `${string}-12-31`,
string,
"01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12",
"01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | "30" | "31",
string,
"00" | "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23",
"00" | "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | "30" | "31" | "32" | "33" | "34" | "35" | "36" | "37" | "38" | "39" | "40" | "41" | "42" | "43" | "44" | "45" | "46" | "47" | "48" | "49" | "50" | "51" | "52" | "53" | "54" | "55" | "56" | "57" | "58" | "59",
"00" | "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | "30" | "31" | "32" | "33" | "34" | "35" | "36" | "37" | "38" | "39" | "40" | "41" | "42" | "43" | "44" | "45" | "46" | "47" | "48" | "49" | "50" | "51" | "52" | "53" | "54" | "55" | "56" | "57" | "58" | "59" | "60",
],
/*NamedCapturingGroups*/ {
year: string;
month: "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12";
day: "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | "30" | "31";
date: `${string}-01-01` | `${string}-01-02` | `${string}-01-03` | ... | `${string}-12-29` | `${string}-12-30` | `${string}-12-31`;
hour: "00" | "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23";
minute: "00" | "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | "30" | "31" | "32" | "33" | "34" | "35" | "36" | "37" | "38" | "39" | "40" | "41" | "42" | "43" | "44" | "45" | "46" | "47" | "48" | "49" | "50" | "51" | "52" | "53" | "54" | "55" | "56" | "57" | "58" | "59";
second: "00" | "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | "30" | "31" | "32" | "33" | "34" | "35" | "36" | "37" | "38" | "39" | "40" | "41" | "42" | "43" | "44" | "45" | "46" | "47" | "48" | "49" | "50" | "51" | "52" | "53" | "54" | "55" | "56" | "57" | "58" | "59" | "60";
time: string;
},
/*Flags*/ {
hasIndices: false;
global: false;
ignoreCase: true;
multiline: false;
dotAll: false;
unicode: false;
unicodeSets: false;
sticky: false;
}
>
To back this, RegExp is made a generic type and library typings related to it are largely modified. It now automatically takes care of which type does String#match return, RegExpExecArray<CapturingGroups, NamedCapturingGroups, Flags> or RegExpMatchArray<CapturingGroups>:
dateTimeString.match(dateTimeRegex);
// RegExpExecArray</*CapturingGroups*/ [ ... ], /*NamedCapturingGroups*/ { ... }, /*Flags*/ { ... }> | null
Or if the global flag is set:
const dateTimeRegex = /^(?<date>(?<year>\d{4}|(?!-000000)[+-]\d{6})-(?!(?:0[2469]|11)-31|02-30)(?<month>0[1-9]|1[0-2])-(?<day>0[1-9]|[12]\d|3[01]))[ T](?<time>(?<hour>[01]\d|2[0-3]):(?<minute>[0-5]\d):(?<second>[0-5]\d|60))$/ig;
dateTimeString.match(dateTimeRegex);
// RegExpMatchArray</*CapturingGroups*/ [ ... ]> | null
// === [CapturingGroups[0], ...CapturingGroups[0][]] | null
This PR can be considered the follow up of #55600.
Implementation
This implementation is tailored for TypeScript. It doesn't create any additional syntactic nodes at all. Instead, it feeds strings into RegularExpressionPatterns (actually arrays) and RegularExpressionPatternUnion (actually Sets) [^2] (temporarily named, to be bikeshedded) during scanning in scanRegularExpressionWorker, stores them by capturing groups and passes them to checkRegularExpressionLiteral in the checker.
[^2]: I didn't name it RegularExpressionDisjunction or RegularExpressionAlternatives as it's also used as a container for character classes.
I could have moved the whole scanRegularExpressionWorker to the checker, but I chose to keep it in the scanner for easier reviewing and because scanEscapeSequence is referenced in the worker function and moving it out creates duplicate code.
This is probably not the intended code structure, however I don't think I am the right person to alter the codebase structure largely, which is one of the reason why I keep the changes minimal. (It's still a significant number of lines though)
This PR is a breaking change. In the worst case, a new flag in tsconfig.json might be necessary if it's really too breaky.
There are currently a few tricky workaround that doesn’t seem acceptable in the TypeScript codebase. For example, some overloads in es5.d.ts are redeclared in es2015.symbol.wellknown.d.ts for them to be prioritized. Nevertheless, things do behave as intended. Besides, there are some underscore types in es5.d.ts due to #2225 – although they can be eliminated by modularizing the file just like what we have done in esnext.iterator.d.ts, I chose not to do so for the time being to await for feedback from the TS team.
In addition, I have only undertaken a limited number of memory and performance optimizations given my lack of experience in this area. I am therefore seeking assistance from the TS team in this regard.
Blockers
Besides #2225 mentioned above, #51751 and #45972 are also blockers of this PR.
Due to #51751, CapturingGroupsArray can't be typed [string, ...(string | undefined)[]], which causes capturingGroups_0: string to be missing before capturingGroups: (string | undefined)[] in the type of StringReplaceCallbackSignature (with default type parameters).
#45972 causes assignability issues between existing functions and StringReplaceCallbackSignature with default type parameters, which happens when RegExp is constructed from other sources, e.g. by its constructor with a string.
Unhandled cases
- If every alternatives in a disjunction have groups with the same name, the named capturing group must not be undefined
Fixes
(Is it necessary to separate it out to another PR?)
- Emit errors on
\kin character classes whenhasNamedCapturingGroupsistrue
Known issues
I didn’t managed to fix up the following before my trip:
\k<foo>is inferred as the empty string instead of"foo"in/((?<foo>foo))\k<foo>/- No TS1515 for
/((?<foo>))(?<foo>)/(essentially the same issue as the above)
If the most extreme case were that there weren’t any activities on this PR for a month or two, I would appreciate any help in resolving them.
Other known issues (Probably unfixable)
new RegExp(/a/, "i")does not cause a rescan and alter the type inference from"a"to"a" | "A"
Fixes #32098 Closes #50452