Orson Peters
Orson Peters
> > Removal of allow_duplicates and change of semantics. Instead of first choosing the quantile values based on the breakpoints and then filtering (leading to problems), we instead semantically sort...
> That's just not true. Ask any a statistician or refer to literature. Quantiles are, in a way, the inverse of a probability distribution. I would be very interested in...
> In other words, one calculates empirical quantiles of a given sample/data. [...] So you are using the order statistic of a given sample/data set. That's statistics with a notion...
> Then there is the point of specifying which quantiles exactly. For the naming, I would at least look at what other libraries and languages do: None of the linked...
@lorentzenchr Perhaps you find my argument more convincing if you consider that both '`cut`' and '`qcut`' are perfectly well-defined functions over series of *strings*.
Another bikeshed suggestion: `bin` and `bin_sorted`?
> A cut_rank or cut_partition would be fine and avoid some confusion. But to me, it is a different/additional function doing something different. > Modern gradient boosting libs like XGBoost,...
Alright, proposed redesign, take two. We remove `cut` and `qcut` (which I honestly feel are pretty bad names anyway) in favor of three binning functions: ```python def bin_intervals( self, intervals:...
> I'm unsure how an argument regarding the connection between quantities and probability led to the need for changing two existing function names, their parameter names, and their default values,...
@xuJ14 > I think 'drop' should be added, since sometimes we do not want multiple values for the breakpoint. That's what first / last do, they just allow more flexibility...