proposal-array-grouping
proposal-array-grouping copied to clipboard
I still think Array#partition will be great as well
Original proposal: https://github.com/tc39/proposal-array-filtering/issues/2
Ofc with groupBy we could be emulate partition as:
function partition(arr, fn) {
return Object.values(arr.groupBy(idx => Number(!fn(idx))));
}
or
function partition(arr, fn) {
return Object.assign([], arr.groupBy(idx => Number(!fn(idx))));
}
But why not just present it as a first class citizen?
Returning a keyed object is much cleaner than spitting out a tuple of values (your emulation doesn’t sort the object keys, btw), from an api design perspective.
your emulation doesn’t sort the object keys, btw
This is another reason that it would be nice to have a correct method. Returning a tuple in combination with a destructive operation is much more convenient if we have only group of two values:
const { true: a, false: b } = arr.groupBy(x => x > 2);
vs
const [a, b] = arr.partition(x => x > 2);
In that example though, should the trues or the falses come first? I believe many languages do it differently.
I believe many languages do it differently.
That's doesn't matter. I'm just demonstrating ergonomics
It matters if it means we couldn’t come up with a result that was consistently intuitive - in other words, if the order isn’t intuitive, it is thus far less ergonomic.
I believe many languages do it differently.
Btw I'm not sure about this. Here Scala example:
val x = List(15, 10, 5, 8, 20, 12)
val (a, b) = x.partition(_ > 10)
> a: List[Int] = List(15, 20, 12)
> b: List[Int] = List(10, 5, 8)
Btw Scala also have groupBy and also span and splitAt
In that example though, should the trues or the falses come first? I believe many languages do it differently.
Trues first. We can see the related links in the original issue.
Implementation in other languages:
Implementation in JS libs:
The first list is for the true predicate. The second is for the false. Valid for all the examples. It seems intuitive to me. We "separate the wheat from the chaff". Can you give an example of languages or libraries where this method is implemented the other way around?
One more thought.
Of course, this will not be a weighty argument, but still I would like to add it.
When using a typescript, we can use typeguards as a predicate. This will allow for easy separation of types, which (as it seems to me) cannot be done through a groupBy.
Another argument for addision partition method is that runtimes can be much more efficient with it, since the result is always a tuple of two elements unlike groupBy which return an object with an unknown number of entries.
I'm going to try and brainstorm a couple of ways in which we could make it clear whether "true" means first or second element of array. Not that any of these ideas are any good, I'm mostly just thinking outloud.
We could have "partition" act like "groupBy", except it inserts results into an array instead of an object.
const [minors, adults] = users.partition(user => user.age >= 18 ? 1 : 0)
const [minors, adults] = users.partition(user => Number(user.age >= 18))
const [minors, adults] = users.partition(user => user.age >= 18) // Possible if the result is auto-coerced into a number. But, this would cause partition() to behave backwards from any other language.
const [infants, oneYearOlds, twoYearOlds, ...] = users.partition(user => user.age)
We could try and find a different name for "partition" that indicates the ordering:
// Horrible name, but the ordering is right in the name
const [adults, minors] = users.trueFalseGrouping(user => user.age >= 18)
// Maybe ...
const [adults, minors] = users.selectOrReject(user => user.age >= 18)
I'm also perfectly happy with just doing this:
const { true: adults, false: minors } = users.groupBy(user => user.age >= 18)
For how often I need a partition behavior like this, just using groupBy seems simple enough, and it's very readable for everyone else looking at the code.
I'd note that it would be a little ironic if we chose to introduce the partition method with this kind of ambiguity when this whole repo originated because of an attempt to provide an alternative filter method name that was less ambiguous than "filter".
Maybe @rbuckton could weigh in on whether groupBy can be easily typed to provide a similar experience as https://github.com/tc39/proposal-array-grouping/issues/9#issuecomment-896691901.
Can you give an example of languages or libraries where this method is implemented the other way around?
Swift, but it's the only one I know of. true-first is a very firm precedent for people coming from other languages.
I still think const { true: a, false: b } = arr.groupBy(x => x > 2); is clearer, though, since it does not require readers to know this non-obvious fact.
@bakkot The Swift's partition only changes the order of the elements, and does not divide into two lists. This is closer to sorting than grouping.
I think one issue with true-first is that result[true] would give index 1, and result[false] index 0 - ie, the reverse. The object form doesn’t suffer from this.
@dartess It's common for functions like this in languages which aren't functional to mutate rather than returning new lists. (See e.g. std::partition in C++.) It should be understood as being the same conceptual function, just implemented in the appropriate manner for the language.
It is also sometimes very convenient when the partitioned/splitted result provides an array instead entries. Splitting in two groups and get middle (first, last, whatever) element for each group:
[1, 2, 3, 4, 5, 6].partition(x => !(x > 3)).map(x => x[x.length >>> 1])
// ^ ^
output:
[ 2, 5 ]
Alright - I'm going to copy-paste your example but break it up over multiple lines, because I'm going to compare another code snippets against it:
[1, 2, 3, 4, 5, 6]
.partition(x => !(x > 3))
.map(x => x[x.length >>> 1])
I can see how something like that would be nice. Though IMO, it's not too bad to use .groupBy() for this either, it's just one more step, and that extra step is used to clearly show what the ordering of the final result will be.
// I'm using the pipeline operator, because I like it
[1, 2, 3, 4, 5, 6]
|> %.groupBy(x => x > 3)
|> [%.false, %.true]
|> %.map(x => x[x.length >>> 1])
It depends on how you want it typed. The groupBy I use in @esfx/iter-query can be found here: https://github.com/esfx/esfx/blob/b502788fd4f20a95626e6988ef86f459aa92d15c/packages/iter-fn/src/grouping.ts#L275-L304
For a general groupBy operation, I'd want to be able to group by more than just "stringified" (or symbol) keys:
import { from } from "@esfx/iter-query";
// generalization of something like an AST
const a = { name: "a" };
const b = { name: "b" };
const a_child = { parent: a };
const b_child1 = { parent: b };
const b_child2 = { parent: b };
const nodes = [a_child, b_child1, b_child2];
from(nodes)
.groupBy(node => node.parent)
.map(({ key, values }) => {
// key: reference to `a` or `b` above
// values: an iterable of each value with the above key.
});
A groupBy that only supports "stringified" keys wouldn't meet my needs of a number of places where I have used groupBy in other libraries or languages. That limitation would likely keep me from using it in a number of cases.
In @esfx/iter-query I would do the following to create a keyed object:
const { true: a, false: b } = from([1, 2, 3, 4, 5, 6])
.groupBy(x => x > 2)
.toObject(null, ({ key }) => `${key}`, ({ values }) => [...values]);
I also do plan on bringing something to TC39 in the near future around key equality (i.e., the ability to override key equality for a Map or Set and could also apply to an operation like groupBy, but does not apply to == or ===). I've been talking about it on and off for a few years but haven't had the time to formalize a proposal due to my existing workload.
Leaving aside the naming discussion, a partition into "truthy" and "falsy" values could be typed in TypeScript:
declare function partition<T, U extends T>(values: Iterable<T>, predicate: (value: T) => value is U): [U[], T[]];
declare function partition<T>(values: Iterable<T>, predicate: (value: T) => boolean): [T[], T[]];
declare class Animal {}
declare class Dog extends Animal { bark(): void; }
declare function isDog(value: Animal): value is Dog;
declare const animals: Animal[];
const [a, b] = partition(animals, isDog);
a[0].bark(); // ok
b[0].bark(); // TypeError
Or as { true: U[], false: T[] }, etc.
@rbuckton for the question of how to handle use cases which want to avoid stringification, see https://github.com/tc39/proposal-array-grouping/issues/3.
The decision to have object-return or Map-return is separate, I think. I'm just trying to see if TypeScript will be able to intelligently inspect the output of a groupBy keyer:
function keyer(value: Animal): "dog" | "cat" | "other" {
return isDog(value) ? "dog" : isCat(value) ? "cat" : "other";
}
const { dog, cat, other } = animals.groupBy(keyer);
// valid?
dog[0].bark();
cat[0].meow();
other[0].name(); // assuming all animals have a name
I don’t see why it wouldn’t - ReturnType<keyer> should be able to be used just fine by the groupBy types.
There's no easy way to correlate the "is a" relationship between a string literal and a type guard, in the generic sense. At least, not in a way that just works without a lot of scaffolding:
declare function groupBy<T, M extends Record<string, T>>(
values: Iterable<T>,
keySelector: (value: T) => keyof M
): { [P in keyof M]: M[P][] };
You can write the generic definition, but there's no way to infer M from keySelector, so the only way to call the function would be by fully supplying the type arguments:
interface AnimalKeyMap {
dog: Dog;
cat: Cat;
other: Animal;
}
// this works:
groupBy<Animal, AnimalKeyMap>(animals, animal => animal instanceof Dog ? "dog" : animal instanceof Cat ? "cat" : "other");
// this doesn't:
groupBy(animals, animal => animal instanceof Dog ? "dog" : animal instanceof Cat ? "cat" : "other");
I don’t see why it wouldn’t -
ReturnType<keyer>should be able to be used just fine by the groupBy types.
You would just end up with dog having type Animal[], since we can't correlate "dog" with Dog through inference.
That said, I believe you can have an overload which specifically works for the partition use case mentioned above:
declare function groupBy<T, U extends T>(values: T[], predicate: (value: T) => value is U): { true: U[], false: T[] }
declare function groupBy<T>(values: T[], mapper: (value: T) => string): { [key: string] : T[] }
A groupBy into a non iterable/iterator still feels wrong to me, especially if you have more you want to do with the grouped results.
Here's an example from the tooling we use to generate GitHub PR comments for on-demand benchmarks in TypeScript PRs:
function formatReport(benchmark: Benchmark, options: BenchmarkOptions, noStyles: boolean) {
const cell = formatCell.bind(undefined, noStyles);
const measurements = from(benchmark.measurements)
.flatMap(measurement => measurement.pivot())
.orderBy(measurement => measurement.scenarioIndex)
.thenBy(measurement => measurement.hostIndex)
.thenBy(measurement => measurement.metricIndex);
return html`
<b>Benchmark Report</b>
<table border="0" cellpadding="0" cellspacing="0" ${noStyles ? "" : `class="comparison"`}>
<thead>
<tr>
<th ${noStyles ? "align=left" : `class="metric"`}>${cell("Metric")}</th>
<th ${noStyles ? "align=right" : `class="current mean"`}>${cell("Average")}</th>
<th ${noStyles ? "align=right" : `class="current best"`}>${cell("Best")}</th>
<th ${noStyles ? "align=right" : `class="current worst"`}>${cell("Worst")}</th>
</tr>
</thead>${
measurements
.groupBy(measurement => measurement.measurementName)
.map(group => html`
<tr class="group"><th ${noStyles ? "align=left" : ""} colspan="4">${cell(group.key)}</th></tr>
<tbody>${
from(group)
.map(measurement => html`
<tr class="${getMeasurementClassNames(measurement)}">
<td ${noStyles ? "align=left" : `class="${getMetricClassNames(measurement)}"`}>${cell(measurement.metric)}</td>
<td ${noStyles ? "align=right" : `class="${getCurrentMeanClassNames(measurement)}"`}>${cell(formatMean(measurement))}</td>
<td ${noStyles ? "align=right" : `class="current best"`}>${cell(formatUnit(measurement.minimum, measurement))}</td>
<td ${noStyles ? "align=right" : `class="current worst"`}>${cell(formatUnit(measurement.maximum, measurement))}</td>
</tr>`)}
</tbody>`)}
</table>`;
}
Here I'm using a .groupBy into a .map. If the result isn't iterable, then I'd have to write
Object.entries(measurements.groupBy(measurement => measurement.measurementName))
.map(group => html`...`)
Perhaps another approach might be to have the result of groupBy be iterable and support stringified/symbol keys, in a way not too dissimilar from the groups property of result of RegExp.prototype.exec()?
In TS parlance, something like this:
type GroupByGroups<K, V, U extends V = V> =
& { [P in Extract<K, string | symbol>]: V[] } // handles `string | symbol`
& { [P in Extract<K, number> as `${P}`]: V[] } // handles Number.prototype.toString()
& { [P in Extract<K, true> as `true`]: U[] } // handles `true` -> `"true"`
& { [P in Extract<K, false> as `false`]: V[] } // handles `false` -> `"false"`;
interface GroupByResultArray<K, V, U extends V = V> extends Array<[K, V[]]> {
groups: GroupByGroups<K, V, U>;
}
declare function groupBy<T, U extends T>(values: Iterable<T>, keySelector: (value: T) => value is U): GroupByResultArray<boolean, T, U>;
declare function groupBy<T, K>(values: Iterable<T>, keySelector: (value: T) => K): GroupByResultArray<K, T>;
// works like this
const { true: a, false: b } = groupBy([dog, cat, bear], (dog): dog is Dog).groups;
a; // Dog[]
b; // Animal[]
// or like this
const { foo, bar } = groupBy([{ name: "foo" }, { name: "bar" }], x => x.name).groups;
foo; // { name: string }[]
bar; // { name: string }[]
// or like this
for (const [key, values] of groupBy(people, person => person.familyName)) {
}
In that way you get the best of both worlds, and there's parity with another language feature so the behavior is easy to explain using existing functionality.
Though the above still isn't fully type safe if your keySelector returns "true" instead of true. Its not impossible to write the type, but becomes much more complex in order to handle the corner cases.
@rbuckton Those comments seem like they belong in #3, rather than this thread.
@bakkot that comment is a bit of a mixed bag, sorry. I posted it here to bring up the type safety aspects, though the other aspects of the comment would make sense for #3.