plot icon indicating copy to clipboard operation
plot copied to clipboard

A transform to consolidate ordinal values outside the top n into an “other” category, perhaps in conjunction with the group transform.

Open mbostock opened this issue 5 years ago • 11 comments

e.g., https://next.observablehq.com/d/0e0c0dcb66d6714e

function other(valueof, domain, unknown) {
  if (typeof valueof !== "function") valueof = field(valueof);
  domain = new Set(domain);
  return (d, i, data) => {
    const value = valueof(d, i, data);
    return domain.has(value) ? value : unknown;
  };
}

function field(x) {
  return d => d[x];
}

mbostock avatar Feb 24 '21 21:02 mbostock

I wonder if this should be something you specify as a scale transform rather than a mark transform? Seems handy…

mbostock avatar Feb 24 '21 22:02 mbostock

(I don't have access to https://observablehq.com/d/0e0c0dcb66d6714e)

Doing it on the scale would just be like specifying .unknown("Others")? Could be interesting for individual marks (like dot), but we need it as a data transform, I think, for aggregate operations (bars).

Fil avatar Feb 25 '21 11:02 Fil

Sorry, that’s an internal dashboard. But I can try to make another example for you that uses the “other” transform.

mbostock avatar Feb 25 '21 14:02 mbostock

My own use case for nominal "Others" is detailed in this “modalities” notebook.

Fil avatar Feb 25 '21 18:02 Fil

I've made some progress on this idea; seems to work with facets https://observablehq.com/d/0bca2cad63c75fe1

Fil avatar Mar 25 '21 17:03 Fil

something you specify as a scale transform rather than a mark transform

I've tried a few things to achieve this, by passing the domain to the scale transform in plot.js#38, but my conclusion is it's a dead end. The scale transform is invoked too late, after the grouping, when the aggregation (count) is already done; so, even if we map all the individual groups to the same place on the screen, they will not be aggregated. For counts, we could maybe recount (sum the sums in the aggregated channel, but which one is it?), and this would not work for other types of aggregation.

Fil avatar Mar 26 '21 11:03 Fil

This solution works on X, where others+k are an option of the group reducer.

--- a/src/transforms/group.js
+++ b/src/transforms/group.js
@@ -67,7 +67,7 @@ function groupn(
   // The z, fill, and stroke channels (if channels and not constants) are
   // greedily materialized by the transform so that we can reference them for
   // subdividing groups without having to compute them more than once.
-  const {z, fill, stroke, ...options} = inputs;
+  const {z, fill, stroke, others, k = 10, ...options} = inputs;
   const [BZ, setBZ] = maybeLazyChannel(z);
   const [vfill] = maybeColor(fill);
   const [vstroke] = maybeColor(stroke);
@@ -84,6 +84,15 @@ function groupn(
     ...Object.fromEntries(outputs.map(({name, output}) => [name, output])),
     transform: maybeTransform(options, (data, facets) => {
       const X = valueof(data, x);
+      if (others && X) {
+        const domain0 = sort(grouper(X, d => d), ([,{length}]) => -length);
+        if (domain0.length > k + 1) {
+          const domain = new Set(domain0.slice(0, k).map(d => d[0]));
+          for (let i = 0; i < X.length; i++) {
+            if (!domain.has(X[i])) X[i] = others;
+          }
+        }
+      }
       const Y = valueof(data, y);

Capture d’écran 2021-03-26 à 15 59 19

EDIT I don't think we should pursue in this direction, since the modalities function defined in this notebook returns both the channel and a domain that we can use in the scale definition. This is enough for the purpose and in line with https://github.com/observablehq/plot/pull/271#issuecomment-806311774 .

Fil avatar Mar 26 '21 15:03 Fil

We now have sort:{ fx: { value: …, limit } } in #442 ; the only thing missing is "others".

Fil avatar Aug 19 '21 17:08 Fil

Some more pairing on this, led by Fil: https://observablehq.com/d/f3aac7d647ef1c9e

image

tophtucker avatar Apr 04 '23 21:04 tophtucker

A more advanced experiment here https://observablehq.com/@observablehq/plot-stacking-others-144

Fil avatar Jul 20 '23 15:07 Fil

Yet another take https://observablehq.com/@observablehq/plot-group-top-n-with-others

Fil avatar Mar 10 '25 21:03 Fil