Propagate captured RegExp groups to the output value
Related to #695 and possibly related to #491 (specifically similar to the second half of this comment), it would be reasonable to assume that RegExp fields should be able to use capture groups and be able to propagate those groups to the result.
In practice, when a RegExp field is parsed, if the pattern contains capturing groups, those groups should be accessible from the resulting value as a tuple property groups. In order to maintain parity with vanilla JS RegExp and standard regular expression behavior, the first value of the tuple should be the entire matched string with the matched groups beginning at index 1.
Examples:
// should simply result in a string
const Foo = type("/^abcd$/");
const value = Foo.assert("abcd"); // 'abcd'
// should result in a union of the matched string and a group tuple
const Foo = type("/^(abc)d$/");
const value = Foo.assert("abcd"); // 'abcd' | { groups: ['abcd', 'abc'] }
const [_, group] = value.groups;
// should result in a union of the matched string and a group tuple
const Foo = type("^/(\d{1,3}[A-Z])? ([A-Za-z ']+), ([A-Za-z \-]+), ([A-Z]{2}\d{1,2} \d{1,2}[A-Z]{2})$/");
const value = Foo.assert("221B Baker Street, London, NW1 6XE"); // '221B Baker Street, London, NW1 6XE' | { groups: ['221B Baker Street, London, NW1 6XE', '221B', 'Baker Street', 'London', 'NW1 6XE'] }'
const [address, houseNumber, road, city, postCode] = value.groups;
Definitely would love to add this! Will have to think a bit about the syntax to differentiate from a pattern-only regex, but would be a very valuable API to have, especially if we can make it type safe.
As with #695, would welcome external contributions here and would be happy to answer questions about the type system and help get it integrated smoothly!
A low hanging fruit implementation would be to simply make all returned RegExp values be typed as string & { groups?: string[] }, and have pattern-only regex values leave groups as undefined. This should enable it to remain backward compatible with all current uses of it with only the minor inconvenience of having to deal with the nullability of groups in the new case. (e.g. const [...] = value.groups ?? [];)
At my level of understanding regarding ArkType's typing system (read: it might as well be straight up witchcraft), I'm not sure how to get beyond that point.
@Abion47 I assume you're referring to adding groups as a prop at runtime on the result type?
Unfortunately, although encoding extra information on a primitive via branding works at a type-level, it is impossible to attach additional properties to a string.
That's an interesting factoid I did not know. Mark that as one more entry under the "Javascript has rules except when it chooses to break them" category.
Still, it seems to work if you target String rather than string.
type StringWithProp = string & { groups?: string[] };
function foo(): StringWithProp {
const s: any = new String('abc');
s.groups = ['a', 'b', 'c'];
return s as StringWithProp;
}
function test(s: string) {
console.log(s, `${s}`);
}
const bar = foo();
console.log(bar); // String: "abc"
console.log(bar.groups); // ["a", "b", "c"]
console.log(typeof bar === 'string'); // false
test(bar); // String: "abc", "abc"
That's not quite a solution as typeof value === 'string' will no longer be true, but value as string seems to work just fine. I can see why a lot of linters default to not wanting you to use String.
In that case, there doesn't seem to be a backward compatible way to implement this simply - so much for the low hanging fruit option. I don't see another real solution beyond returning a compound type in the event that a RegExp contains capture groups. Or perhaps it just returns the tuple itself instead of attaching it to a property:
// Returns a string if there are no capture groups
const SimpleRegExp = type("/^abcd$/");
const value = SimpleRegExp.assert("abcd"); // 'abcd'
// Returns a tuple containing the captured groups
const GroupedRegExp = type("^/(\d{1,3}[A-Z])? ([A-Za-z ']+), ([A-Za-z \-]+), ([A-Z]{2}\d{1,2} \d{1,2}[A-Z]{2})$/");
const [address, houseNumber, road, city, postCode] = GroupedRegExp.assert("221B Baker Street, London, NW1 6XE");
@Abion47 Yeah this is a whole mess, trust me you don't want to start turning strings into Strings 😅
It would need its own dedicated syntax, but TBH I'd probably go for that regardless because the use cases for transforming a string into groups and checking if it matches a pattern are pretty distinct and the API will likely be a lot clearer if it reflects that rather than trying to cram multiple results together.
How about this. What if in addition to type and scope, there was a pattern type factory that was responsible for handling stuff like this? It can be initially backed by regular expressions, but in the future it could also be expanded to other types of complex pattern matching.
const LondonAddress = pattern({
// The presence of the regex property indicates that this is a RegExp powered pattern
regex: ^/(\d{1,3}[A-Z])? ([A-Za-z ']+), ([A-Za-z \-]+), ([A-Z]{2}\d{1,2} \d{1,2}[A-Z]{2})$/,
// Optionally, the groups property assigns a type to each matched group
// If the type system's regex parsing can manage it, this could possibly be inferred
groups: [
// Should group 0 be required to be specified or should it be assumed to just be "string"?
"/^\d{1,3}[A-Z]$/",
"/^[A-Za-z ']+$/",
"/^[A-Za-z \-]+$/",
"/^[A-Z]{2}\d{1,2} \d{1,2}[A-Z]{2}$/",
],
});
const [address, houseNumber, road, city, postCode] = LondonAddress.assert("221B Baker Street, London, NW1 6XE");
@Abion47 Hmm, in the end it would just be a special syntax for a type with a morph attached (i.e. something like type("string").pipe(s => regex.exec(s))).
In general the philosophy has been to offer syntax + chainable methods for APIs like this rather than a new API entry point.
Honestly, I'm very confident we can come up with a reasonable API for it, but I just don't want to think about it until someone is interested in implementing it since I need to focus on docs for now 😛
If you or someone else wants to take a stab at it though, let me know and I will commit to something.