plot icon indicating copy to clipboard operation
plot copied to clipboard

The bin transform should detect integers

Open mbostock opened this issue 3 years ago • 2 comments

It’d be nice if the bin transform were smart enough to detect integers and make sure that the bin size is not fractional. For example consider the Chinook dataset where the MediaTypeId column is a number whose value is 1, 2, 3, 4, or 5:

untitled (50)

Plot.plot({
  marks: [
    Plot.rectY(tracks, Plot.binX({y: "count"}, {x: "MediaTypeId"})),
    Plot.ruleY([0])
  ]
})

If the bin transform detected integers automatically, you could get something like this instead:

untitled (51)

Plot.plot({
  x: {
    interval: 1
  },
  marks: [
    Plot.rectY(tracks, Plot.binX({y: "count"}, {x: "MediaTypeId", interval: 1})),
    Plot.ruleY([0])
  ]
})

Potentially this could also work with the interval-aware default tick format, too. #932

untitled (52)

Plot.plot({
  x: {
    interval: 1,
    tickFormat: ""
  },
  marks: [
    Plot.rectY(tracks, Plot.binX({y: "count"}, {x: "MediaTypeId", interval: 1})),
    Plot.ruleY([0])
  ]
})

Though, I suppose the group transform would be even better here…

untitled (53)

Plot.plot({
  marks: [
    Plot.barY(tracks, Plot.groupX({y: "count"}, {x: "MediaTypeId"})),
    Plot.ruleY([0])
  ]
})

mbostock avatar Jul 27 '22 19:07 mbostock

Related: #932 #355 #734

When switching to groups (which gives a better histogram in this case), there is a risk of not showing groups with no data, and the interval option is needed.

Fil avatar Jul 27 '22 21:07 Fil

There’s probably a similar enhancement here with temporal data: e.g., if the values are all at UTC midnights, then we shouldn’t choose a bin threshold shorter than d3.utcDay. But testing for lots of time intervals (seconds, minutes, hours, days, weeks, months, years) might be slow… though maybe still fast enough to be worth doing.

mbostock avatar Jul 28 '22 05:07 mbostock

Another example in the wild https://twitter.com/slothstats/status/1664091552627539968

image

mbostock avatar Jun 01 '23 15:06 mbostock

This can be generalized to other intervals too. For example if you have daily data, you don’t want the bin transform using hourly bins.

mbostock avatar Aug 23 '23 20:08 mbostock