proposal-array-grouping icon indicating copy to clipboard operation
proposal-array-grouping copied to clipboard

I still think Array#partition will be great as well

Open MaxGraey opened this issue 3 years ago • 30 comments

Original proposal: https://github.com/tc39/proposal-array-filtering/issues/2

Ofc with groupBy we could be emulate partition as:

function partition(arr, fn) {
  return Object.values(arr.groupBy(idx => Number(!fn(idx))));
}

or

function partition(arr, fn) {
  return Object.assign([], arr.groupBy(idx => Number(!fn(idx))));
}

But why not just present it as a first class citizen?

MaxGraey avatar Jul 15 '21 08:07 MaxGraey

Returning a keyed object is much cleaner than spitting out a tuple of values (your emulation doesn’t sort the object keys, btw), from an api design perspective.

ljharb avatar Jul 15 '21 14:07 ljharb

your emulation doesn’t sort the object keys, btw

This is another reason that it would be nice to have a correct method. Returning a tuple in combination with a destructive operation is much more convenient if we have only group of two values:

const { true: a, false: b } = arr.groupBy(x => x > 2);

vs

const [a, b] = arr.partition(x => x > 2);

MaxGraey avatar Jul 15 '21 14:07 MaxGraey

In that example though, should the trues or the falses come first? I believe many languages do it differently.

ljharb avatar Jul 15 '21 14:07 ljharb

I believe many languages do it differently.

That's doesn't matter. I'm just demonstrating ergonomics

MaxGraey avatar Jul 15 '21 15:07 MaxGraey

It matters if it means we couldn’t come up with a result that was consistently intuitive - in other words, if the order isn’t intuitive, it is thus far less ergonomic.

ljharb avatar Jul 15 '21 15:07 ljharb

I believe many languages do it differently.

Btw I'm not sure about this. Here Scala example:

val x = List(15, 10, 5, 8, 20, 12)
val (a, b) = x.partition(_ > 10)

> a: List[Int] = List(15, 20, 12)
> b: List[Int] = List(10, 5, 8)

Btw Scala also have groupBy and also span and splitAt

MaxGraey avatar Jul 15 '21 15:07 MaxGraey

In that example though, should the trues or the falses come first? I believe many languages do it differently.

Trues first. We can see the related links in the original issue.

Implementation in other languages:

Implementation in JS libs:

The first list is for the true predicate. The second is for the false. Valid for all the examples. It seems intuitive to me. We "separate the wheat from the chaff". Can you give an example of languages or libraries where this method is implemented the other way around?

dartess avatar Aug 11 '21 09:08 dartess

One more thought.

Of course, this will not be a weighty argument, but still I would like to add it.

When using a typescript, we can use typeguards as a predicate. This will allow for easy separation of types, which (as it seems to me) cannot be done through a groupBy.

Example

dartess avatar Aug 11 '21 10:08 dartess

Another argument for addision partition method is that runtimes can be much more efficient with it, since the result is always a tuple of two elements unlike groupBy which return an object with an unknown number of entries.

MaxGraey avatar Aug 11 '21 10:08 MaxGraey

I'm going to try and brainstorm a couple of ways in which we could make it clear whether "true" means first or second element of array. Not that any of these ideas are any good, I'm mostly just thinking outloud.

We could have "partition" act like "groupBy", except it inserts results into an array instead of an object.

const [minors, adults] = users.partition(user => user.age >= 18 ? 1 : 0)
const [minors, adults] = users.partition(user => Number(user.age >= 18))
const [minors, adults] = users.partition(user => user.age >= 18) // Possible if the result is auto-coerced into a number. But, this would cause partition() to behave backwards from any other language.
const [infants, oneYearOlds, twoYearOlds, ...] = users.partition(user => user.age)

We could try and find a different name for "partition" that indicates the ordering:

// Horrible name, but the ordering is right in the name
const [adults, minors] = users.trueFalseGrouping(user => user.age >= 18)

// Maybe ...
const [adults, minors] = users.selectOrReject(user => user.age >= 18)

I'm also perfectly happy with just doing this:

const { true: adults, false: minors } = users.groupBy(user => user.age >= 18)

For how often I need a partition behavior like this, just using groupBy seems simple enough, and it's very readable for everyone else looking at the code.

I'd note that it would be a little ironic if we chose to introduce the partition method with this kind of ambiguity when this whole repo originated because of an attempt to provide an alternative filter method name that was less ambiguous than "filter".

theScottyJam avatar Aug 11 '21 18:08 theScottyJam

Maybe @rbuckton could weigh in on whether groupBy can be easily typed to provide a similar experience as https://github.com/tc39/proposal-array-grouping/issues/9#issuecomment-896691901.

jridgewell avatar Aug 11 '21 19:08 jridgewell

Can you give an example of languages or libraries where this method is implemented the other way around?

Swift, but it's the only one I know of. true-first is a very firm precedent for people coming from other languages.

I still think const { true: a, false: b } = arr.groupBy(x => x > 2); is clearer, though, since it does not require readers to know this non-obvious fact.

bakkot avatar Aug 11 '21 19:08 bakkot

@bakkot The Swift's partition only changes the order of the elements, and does not divide into two lists. This is closer to sorting than grouping.

dartess avatar Aug 11 '21 20:08 dartess

I think one issue with true-first is that result[true] would give index 1, and result[false] index 0 - ie, the reverse. The object form doesn’t suffer from this.

ljharb avatar Aug 11 '21 20:08 ljharb

@dartess It's common for functions like this in languages which aren't functional to mutate rather than returning new lists. (See e.g. std::partition in C++.) It should be understood as being the same conceptual function, just implemented in the appropriate manner for the language.

bakkot avatar Aug 11 '21 20:08 bakkot

It is also sometimes very convenient when the partitioned/splitted result provides an array instead entries. Splitting in two groups and get middle (first, last, whatever) element for each group:

[1, 2, 3, 4, 5, 6].partition(x => !(x > 3)).map(x => x[x.length >>> 1])
//  ^        ^

output:

[ 2, 5 ]

MaxGraey avatar Aug 11 '21 21:08 MaxGraey

Alright - I'm going to copy-paste your example but break it up over multiple lines, because I'm going to compare another code snippets against it:

[1, 2, 3, 4, 5, 6]
  .partition(x => !(x > 3))
  .map(x => x[x.length >>> 1])

I can see how something like that would be nice. Though IMO, it's not too bad to use .groupBy() for this either, it's just one more step, and that extra step is used to clearly show what the ordering of the final result will be.

// I'm using the pipeline operator, because I like it
[1, 2, 3, 4, 5, 6]
  |> %.groupBy(x => x > 3)
  |> [%.false, %.true]
  |> %.map(x => x[x.length >>> 1])

theScottyJam avatar Aug 11 '21 21:08 theScottyJam

It depends on how you want it typed. The groupBy I use in @esfx/iter-query can be found here: https://github.com/esfx/esfx/blob/b502788fd4f20a95626e6988ef86f459aa92d15c/packages/iter-fn/src/grouping.ts#L275-L304

For a general groupBy operation, I'd want to be able to group by more than just "stringified" (or symbol) keys:

import { from } from "@esfx/iter-query";

// generalization of something like an AST
const a = { name: "a" };
const b = { name: "b" };
const a_child = { parent: a };
const b_child1 = { parent: b };
const b_child2 = { parent: b };
const nodes = [a_child, b_child1, b_child2];

from(nodes)
  .groupBy(node => node.parent) 
  .map(({ key, values }) => {
    // key: reference to `a` or `b` above
    // values: an iterable of each value with the above key.
  });

A groupBy that only supports "stringified" keys wouldn't meet my needs of a number of places where I have used groupBy in other libraries or languages. That limitation would likely keep me from using it in a number of cases.

In @esfx/iter-query I would do the following to create a keyed object:

const { true: a, false: b } = from([1, 2, 3, 4, 5, 6])
  .groupBy(x => x > 2) 
  .toObject(null, ({ key }) => `${key}`, ({ values }) => [...values]);

I also do plan on bringing something to TC39 in the near future around key equality (i.e., the ability to override key equality for a Map or Set and could also apply to an operation like groupBy, but does not apply to == or ===). I've been talking about it on and off for a few years but haven't had the time to formalize a proposal due to my existing workload.

rbuckton avatar Aug 11 '21 23:08 rbuckton

Leaving aside the naming discussion, a partition into "truthy" and "falsy" values could be typed in TypeScript:

declare function partition<T, U extends T>(values: Iterable<T>, predicate: (value: T) => value is U): [U[], T[]];
declare function partition<T>(values: Iterable<T>, predicate: (value: T) => boolean): [T[], T[]];

declare class Animal {}
declare class Dog extends Animal { bark(): void; }
declare function isDog(value: Animal): value is Dog;
declare const animals: Animal[];

const [a, b] = partition(animals, isDog);
a[0].bark(); // ok
b[0].bark(); // TypeError

TypeScript Playground

rbuckton avatar Aug 11 '21 23:08 rbuckton

Or as { true: U[], false: T[] }, etc.

rbuckton avatar Aug 11 '21 23:08 rbuckton

@rbuckton for the question of how to handle use cases which want to avoid stringification, see https://github.com/tc39/proposal-array-grouping/issues/3.

bakkot avatar Aug 11 '21 23:08 bakkot

The decision to have object-return or Map-return is separate, I think. I'm just trying to see if TypeScript will be able to intelligently inspect the output of a groupBy keyer:

function keyer(value: Animal): "dog" | "cat" | "other" {
  return isDog(value) ? "dog" : isCat(value) ? "cat" : "other";
}

const { dog, cat, other } = animals.groupBy(keyer);

// valid?
dog[0].bark();
cat[0].meow();
other[0].name(); // assuming all animals have a name

jridgewell avatar Aug 12 '21 00:08 jridgewell

I don’t see why it wouldn’t - ReturnType<keyer> should be able to be used just fine by the groupBy types.

ljharb avatar Aug 12 '21 02:08 ljharb

There's no easy way to correlate the "is a" relationship between a string literal and a type guard, in the generic sense. At least, not in a way that just works without a lot of scaffolding:

declare function groupBy<T, M extends Record<string, T>>(
  values: Iterable<T>,
  keySelector: (value: T) => keyof M
): { [P in keyof M]: M[P][] };

You can write the generic definition, but there's no way to infer M from keySelector, so the only way to call the function would be by fully supplying the type arguments:

interface AnimalKeyMap {
  dog: Dog;
  cat: Cat;
  other: Animal;
}

// this works:
groupBy<Animal, AnimalKeyMap>(animals, animal => animal instanceof Dog ? "dog" : animal instanceof Cat ? "cat" : "other");

// this doesn't:
groupBy(animals, animal => animal instanceof Dog ? "dog" : animal instanceof Cat ? "cat" : "other");

rbuckton avatar Aug 12 '21 03:08 rbuckton

I don’t see why it wouldn’t - ReturnType<keyer> should be able to be used just fine by the groupBy types.

You would just end up with dog having type Animal[], since we can't correlate "dog" with Dog through inference.

rbuckton avatar Aug 12 '21 03:08 rbuckton

That said, I believe you can have an overload which specifically works for the partition use case mentioned above:

declare function groupBy<T, U extends T>(values: T[], predicate: (value: T) => value is U): { true: U[], false: T[] }
declare function groupBy<T>(values: T[], mapper: (value: T) => string): { [key: string] : T[] }

playground link

bakkot avatar Aug 12 '21 03:08 bakkot

A groupBy into a non iterable/iterator still feels wrong to me, especially if you have more you want to do with the grouped results.

Here's an example from the tooling we use to generate GitHub PR comments for on-demand benchmarks in TypeScript PRs:

function formatReport(benchmark: Benchmark, options: BenchmarkOptions, noStyles: boolean) {
    const cell = formatCell.bind(undefined, noStyles);
    const measurements = from(benchmark.measurements)
        .flatMap(measurement => measurement.pivot())
        .orderBy(measurement => measurement.scenarioIndex)
        .thenBy(measurement => measurement.hostIndex)
        .thenBy(measurement => measurement.metricIndex);

    return html`
    <b>Benchmark Report</b>
    <table border="0" cellpadding="0" cellspacing="0" ${noStyles ? "" : `class="comparison"`}>
        <thead>
            <tr>
                <th ${noStyles ? "align=left" : `class="metric"`}>${cell("Metric")}</th>
                <th ${noStyles ? "align=right" : `class="current mean"`}>${cell("Average")}</th>
                <th ${noStyles ? "align=right" : `class="current best"`}>${cell("Best")}</th>
                <th ${noStyles ? "align=right" : `class="current worst"`}>${cell("Worst")}</th>
            </tr>
        </thead>${
            measurements
                .groupBy(measurement => measurement.measurementName)
                .map(group => html`
        <tr class="group"><th ${noStyles ? "align=left" : ""} colspan="4">${cell(group.key)}</th></tr>
        <tbody>${
            from(group)
                .map(measurement => html`
            <tr class="${getMeasurementClassNames(measurement)}">
                <td ${noStyles ? "align=left" : `class="${getMetricClassNames(measurement)}"`}>${cell(measurement.metric)}</td>
                <td ${noStyles ? "align=right" : `class="${getCurrentMeanClassNames(measurement)}"`}>${cell(formatMean(measurement))}</td>
                <td ${noStyles ? "align=right" : `class="current best"`}>${cell(formatUnit(measurement.minimum, measurement))}</td>
                <td ${noStyles ? "align=right" : `class="current worst"`}>${cell(formatUnit(measurement.maximum, measurement))}</td>
            </tr>`)}
        </tbody>`)}
    </table>`;
}

Here I'm using a .groupBy into a .map. If the result isn't iterable, then I'd have to write

Object.entries(measurements.groupBy(measurement => measurement.measurementName))
  .map(group => html`...`)

Perhaps another approach might be to have the result of groupBy be iterable and support stringified/symbol keys, in a way not too dissimilar from the groups property of result of RegExp.prototype.exec()?

In TS parlance, something like this:

type GroupByGroups<K, V, U extends V = V> =
  & { [P in Extract<K, string | symbol>]: V[] } // handles `string | symbol`
  & { [P in Extract<K, number> as `${P}`]: V[] } // handles Number.prototype.toString()
  & { [P in Extract<K, true> as `true`]: U[] } // handles `true` -> `"true"`
  & { [P in Extract<K, false> as `false`]: V[] } // handles `false` -> `"false"`;

interface GroupByResultArray<K, V, U extends V = V> extends Array<[K, V[]]> {
  groups: GroupByGroups<K, V, U>;
}

declare function groupBy<T, U extends T>(values: Iterable<T>, keySelector: (value: T) => value is U): GroupByResultArray<boolean, T, U>;
declare function groupBy<T, K>(values: Iterable<T>, keySelector: (value: T) => K): GroupByResultArray<K, T>;

// works like this
const { true: a, false: b } = groupBy([dog, cat, bear], (dog): dog is Dog).groups; 
a; // Dog[]
b; // Animal[]

// or like this
const { foo, bar } = groupBy([{ name: "foo" }, { name: "bar" }], x => x.name).groups; 
foo; // { name: string }[]
bar; // { name: string }[]

// or like this
for (const [key, values] of groupBy(people, person => person.familyName)) {
}

In that way you get the best of both worlds, and there's parity with another language feature so the behavior is easy to explain using existing functionality.

rbuckton avatar Aug 12 '21 18:08 rbuckton

Though the above still isn't fully type safe if your keySelector returns "true" instead of true. Its not impossible to write the type, but becomes much more complex in order to handle the corner cases.

rbuckton avatar Aug 12 '21 18:08 rbuckton

@rbuckton Those comments seem like they belong in #3, rather than this thread.

bakkot avatar Aug 12 '21 18:08 bakkot

@bakkot that comment is a bit of a mixed bag, sorry. I posted it here to bring up the type safety aspects, though the other aspects of the comment would make sense for #3.

rbuckton avatar Aug 12 '21 18:08 rbuckton