plot
plot copied to clipboard
Auto: error on underspecified reduce
Watching people use the chart cell, they often set a reducer without understanding what it’s reducing. Min of what, max of what, mean of what? Setting most reducers without a field is incoherent; even if it produces something, it’ll be misleading. But people often don’t realize they have to do something else, because they see some output, and scratch their heads trying to interpret it.
Since it’s easy to statically tell if a configuration is invalid like this, this PR throws an error in those cases. Demo: https://observablehq.com/d/932dd87a4c129e16
I tried to think of a more plain-English way to say this. We could do something like Error: ${channel} is "${reducer}" of what? reducer requires ${channel} field. People don’t know the word “reducer”, but they know the prepositional relationship expressed in “mean” of “height”. It’s gotta be of something.
Broken out from https://github.com/observablehq/plot/pull/1424, where we previously discussed whether "count" is really the only reducer that doesn’t need this error:
Fil:
sum does not need an input (it defaults to count); distinct works too I think, also giving count. min-index and max-index will always give 0. And then there are the x1, x2 reducers (for binX), which give the bins' bounds.
Toph:
Even though sum and distinct work as count, I've seen several people get confused by selecting "distinct" in the chart cell and seeing nothing change, so I'm still sort of inclined to throw (or warn?) on them.
Another potential direction is issue a warning when the input is more than 50% nullish or NaN, and work on hooking up Plot’s warnings to be displayed by the chart cell (#1192, although that issue presupposes a specific solution and we may want to consider other approaches).
Oh, wait. 🤔💡 The reason this isn’t an error is because if the input channel doesn’t exist, then the data itself is fed to the reducer in lieu of the channel:
ohhh ya. makes sense. plot.auto doesn't support shorthand / primitive arrays, though i've always wanted it to:
so this error wouldn't limit plot.auto any more than it's already limited. though it'd add some inertia to the limit.
Another potential direction is issue a warning when the input is more than 50% nullish or NaN
is the warning approach only bc of the shorthand case? at one point i think you described to me that things should be errors when they're statically incoherent in the options, or warnings if they're about the dynamic data. for plot.auto as it stands today it feels more like an error.
(https://github.com/observablehq/plot/issues/1192, although that issue presupposes a specific solution and we may want to consider other approaches).
yeah good aside, i've rewritten that issue description to be more agnostic 👍
plot.auto doesn't support shorthand / primitive arrays, though i've always wanted it to
Plot.auto doesn’t support shorthand (you have to specify some options), but it does support primitive arrays:
Plot.auto(d3.range(1000).map(d3.randomNormal()), {x: Plot.identity}).plot()
Ohh right right. Here's the behavior of inputless reducers on primitive arrays w/ Plot.auto today:
It's tantalizing because it feels like almost certainly a case of user confusion.
bypass the check if data is an array of primitives (numbers, dates, strings, booleans, etc.)
feels right to me. (you can't tell from the materialized columns, right?) what do you think, same approach as arrayIsPrimitive in stdlib? or maybe we just check the first row or something? what's the plot norm for inference… https://github.com/observablehq/stdlib/blob/main/src/table.js#L86
took a stab at it throwing only if not primitive (using a simpler primitivity check than stdlib table stuff). doesn't feel great to iterate over the data once more each time 🤷
@mbostock I think I’ve addressed all comments:
- Check isIterable before isPrimitive
- Treat bigint as primitive
- Write out "or" by hand instead of array.includes
- Fix it.only
Codewise this lgtm. But can we rephrase the error message a little bit? setting y reducer to "mean" requires setting y field uses "reducer" and "field", but the properties are named reduce and value. Maybe something like “a value is required when setting the reduce to "mean"”, “provide a value to reduce y to "mean"”, “missing y value to reduce with "mean"”, or just “missing value for y”.