plot icon indicating copy to clipboard operation
plot copied to clipboard

Auto: error on underspecified reduce

Open tophtucker opened this issue 2 years ago • 7 comments
trafficstars

Watching people use the chart cell, they often set a reducer without understanding what it’s reducing. Min of what, max of what, mean of what? Setting most reducers without a field is incoherent; even if it produces something, it’ll be misleading. But people often don’t realize they have to do something else, because they see some output, and scratch their heads trying to interpret it.

Since it’s easy to statically tell if a configuration is invalid like this, this PR throws an error in those cases. Demo: https://observablehq.com/d/932dd87a4c129e16

image

I tried to think of a more plain-English way to say this. We could do something like Error: ${channel} is "${reducer}" of what? reducer requires ${channel} field. People don’t know the word “reducer”, but they know the prepositional relationship expressed in “mean” of “height”. It’s gotta be of something.

Broken out from https://github.com/observablehq/plot/pull/1424, where we previously discussed whether "count" is really the only reducer that doesn’t need this error:

Fil:

sum does not need an input (it defaults to count); distinct works too I think, also giving count. min-index and max-index will always give 0. And then there are the x1, x2 reducers (for binX), which give the bins' bounds.

Toph:

Even though sum and distinct work as count, I've seen several people get confused by selecting "distinct" in the chart cell and seeing nothing change, so I'm still sort of inclined to throw (or warn?) on them.

tophtucker avatar Aug 24 '23 15:08 tophtucker

Another potential direction is issue a warning when the input is more than 50% nullish or NaN, and work on hooking up Plot’s warnings to be displayed by the chart cell (#1192, although that issue presupposes a specific solution and we may want to consider other approaches).

mbostock avatar Aug 24 '23 15:08 mbostock

Oh, wait. 🤔💡 The reason this isn’t an error is because if the input channel doesn’t exist, then the data itself is fed to the reducer in lieu of the channel:

ohhh ya. makes sense. plot.auto doesn't support shorthand / primitive arrays, though i've always wanted it to:

image

so this error wouldn't limit plot.auto any more than it's already limited. though it'd add some inertia to the limit.

Another potential direction is issue a warning when the input is more than 50% nullish or NaN

is the warning approach only bc of the shorthand case? at one point i think you described to me that things should be errors when they're statically incoherent in the options, or warnings if they're about the dynamic data. for plot.auto as it stands today it feels more like an error.

(https://github.com/observablehq/plot/issues/1192, although that issue presupposes a specific solution and we may want to consider other approaches).

yeah good aside, i've rewritten that issue description to be more agnostic 👍

tophtucker avatar Aug 24 '23 20:08 tophtucker

plot.auto doesn't support shorthand / primitive arrays, though i've always wanted it to

Plot.auto doesn’t support shorthand (you have to specify some options), but it does support primitive arrays:

untitled (66)

Plot.auto(d3.range(1000).map(d3.randomNormal()), {x: Plot.identity}).plot()

mbostock avatar Aug 24 '23 21:08 mbostock

Ohh right right. Here's the behavior of inputless reducers on primitive arrays w/ Plot.auto today:

image

It's tantalizing because it feels like almost certainly a case of user confusion.

bypass the check if data is an array of primitives (numbers, dates, strings, booleans, etc.)

feels right to me. (you can't tell from the materialized columns, right?) what do you think, same approach as arrayIsPrimitive in stdlib? or maybe we just check the first row or something? what's the plot norm for inference… https://github.com/observablehq/stdlib/blob/main/src/table.js#L86

tophtucker avatar Aug 25 '23 16:08 tophtucker

took a stab at it throwing only if not primitive (using a simpler primitivity check than stdlib table stuff). doesn't feel great to iterate over the data once more each time 🤷

tophtucker avatar Aug 25 '23 21:08 tophtucker

@mbostock I think I’ve addressed all comments:

  • Check isIterable before isPrimitive
  • Treat bigint as primitive
  • Write out "or" by hand instead of array.includes
  • Fix it.only

tophtucker avatar Sep 13 '23 16:09 tophtucker

Codewise this lgtm. But can we rephrase the error message a little bit? setting y reducer to "mean" requires setting y field uses "reducer" and "field", but the properties are named reduce and value. Maybe something like “a value is required when setting the reduce to "mean"”, “provide a value to reduce y to "mean"”, “missing y value to reduce with "mean"”, or just “missing value for y”.

Fil avatar Oct 03 '23 14:10 Fil