dataprep
dataprep copied to clipboard
Binning of numerical column need to snap/align on zero...
The bins of the plot of a numerical column should preferentially include only non-negative or non-positive values. Specifically, a bin should not contain negative values, zero and positive values. The cross-over across 0 is important in most application so highlighting it automatically (by making zero a boundary of the bins) is consistent with being a smart EDA tool.
Describe the solution you'd like The ideal solution would be for plot to detect when the range of values span positive and negative values and to intelligently have the bins' boundary snap/align on zero.
Describe alternatives you've considered My proposed solution may be problematic for implementation if the number of bins requested doesn't allow for this. In that case, I'd suggest to override the specified number of bins. Add a flag to specify 'blind binning' instead of my proposed (and default) 'smart binning'?
Additional context
The user can easily miss the presence of negative values unless he/she hovers histogram to examine the extent of the bin.
I agree, George. Tableau snaps/aligns at zero and uses natural bin endpoints, which I like:
An algorithm for calculating natural bin endpoints could be similar to how we calculate the axis tick locations. @jinglinpeng
This is what it looks like after the initial fix: