datalib icon indicating copy to clipboard operation
datalib copied to clipboard

dl.histogram returning repeated bins

Open mathisonian opened this issue 5 years ago • 2 comments

Hey all,

This library has been extremely useful, but I'm hitting an issue with the histogram method. A simple reproduction is available here: https://observablehq.com/d/cfefd9c0c388f478

The issue is that the values that are returned seem to repeat when a high number of bins is used. For example, I'd expect to get bins with values of ..., .27, .28, .29, ... but instead get two bins at .28 and no bin at .29.

Screen Shot 2020-06-25 at 4 15 54 PM

Its possible that I'm doing something wrong with the options (or just generally unreasonable) in which case please let me know.

mathisonian avatar Jun 25 '20 23:06 mathisonian

Hmm, that looks like a rounding bug introduced by the value call on this line: https://github.com/vega/datalib/blob/master/src/bins/histogram.js#L60

The corresponding function (bins.js#L72) is:

function value(v) {
  return this.step * Math.floor(v / this.step + EPSILON);
}

Meanwhile, the equivalent call in Vega's Bin transform is a bit different. IIRC, I rewrote it to handle these floating point issues:

this.start + this.step * Math.floor(EPSILON + (v - this.start) / this.step)

I'm not sure when I will have the time to patch and test, but in the meantime would welcome a PR if you try the change yourself.

jheer avatar Jun 26 '20 08:06 jheer

Thanks @jheer! I figured it was some rounding issue. These pointers are helpful, I can take a stab at a patch & test in the next couple of days.

mathisonian avatar Jun 26 '20 15:06 mathisonian