dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

Binning of numerical column need to snap/align on zero...

Open overcoil opened this issue 4 years ago • 2 comments

The bins of the plot of a numerical column should preferentially include only non-negative or non-positive values. Specifically, a bin should not contain negative values, zero and positive values. The cross-over across 0 is important in most application so highlighting it automatically (by making zero a boundary of the bins) is consistent with being a smart EDA tool.

Describe the solution you'd like The ideal solution would be for plot to detect when the range of values span positive and negative values and to intelligently have the bins' boundary snap/align on zero.

Describe alternatives you've considered My proposed solution may be problematic for implementation if the number of bins requested doesn't allow for this. In that case, I'd suggest to override the specified number of bins. Add a flag to specify 'blind binning' instead of my proposed (and default) 'smart binning'?

Additional context

The user can easily miss the presence of negative values unless he/she hovers histogram to examine the extent of the bin.

Screen Shot 2020-07-22 at 9 27 27 PM

overcoil avatar Jul 23 '20 04:07 overcoil

I agree, George. Tableau snaps/aligns at zero and uses natural bin endpoints, which I like: Screen Shot 2020-07-24 at 11 42 00 AM An algorithm for calculating natural bin endpoints could be similar to how we calculate the axis tick locations. @jinglinpeng

brandonlockhart avatar Jul 24 '20 19:07 brandonlockhart

This is what it looks like after the initial fix:

Screen Shot 2020-08-26 at 3 40 10 PM

peshotan avatar Aug 26 '20 22:08 peshotan