vega-lite-api icon indicating copy to clipboard operation
vega-lite-api copied to clipboard

Rendering with maxbins when data have data points less than the number of bins themselves

Open Mahesha999 opened this issue 3 years ago • 1 comments

I have a following data:

cdf_data = [
  { d_percentages: 0, student_percentages: 35 },
  { d_percentages: 10, student_percentages: 42 },
  { d_percentages: 20, student_percentages: 55 },
  { d_percentages: 30, student_percentages: 75 },
  { d_percentages: 40, student_percentages: 85 },
  { d_percentages: 50, student_percentages: 91 },
  { d_percentages: 60, student_percentages: 96 },
  { d_percentages: 70, student_percentages: 98 },
  { d_percentages: 80, student_percentages: 98 },
  { d_percentages: 90, student_percentages: 100 },
  { d_percentages: 100, student_percentages: 100 }
]

I created following visualization:

cdf_in_js_with_minbins = {
  const plot = vl.markBar()
    .data(cdf_data)
    .encode(
      vl.y()
        .fieldQ('student_percentages'),
      vl.x()
        .fieldQ('d_percentages')//.bin(true)
        .scale({ "domain": [0, 100] })
        .bin({ minbins: 10 })
    ).width(500).height(250);
  
  return plot.render();
}

This outputs:

image

Initially, before minbins: 10 above, I had tried maxbins: 30, and it rendered following:

image

This confused me a lot, especially because two bars in the range 90-100. Also, nowhere in cdf_data, it says 0-5 range has 35% of students and 5-10 range has 0% of students. I felt that, being "max" limit, it will end up showing just 10 bins as in case of first figure. Instead, it created 20 bins. Am I missing some understanding here or its a bug?

Here is the observablehq notebook rendering both plots.

Mahesha999 avatar Jun 13 '22 08:06 Mahesha999

I think this is as expected. If you have prebinned data, use the binned property. The last in is inclusive the upper bound and not exclusive. The actual number of bins depends only on the range and not the data.

domoritz avatar Jun 13 '22 15:06 domoritz