Orson Peters comments

Results 237 comments of


                                            Orson Peters

trafficstars

Redesign `cut` and `qcut`

> > Removal of allow_duplicates and change of semantics. Instead of first choosing the quantile values based on the breakpoints and then filtering (leading to problems), we instead semantically sort...

Redesign `cut` and `qcut`

> That's just not true. Ask any a statistician or refer to literature. Quantiles are, in a way, the inverse of a probability distribution. I would be very interested in...

Redesign `cut` and `qcut`

> In other words, one calculates empirical quantiles of a given sample/data. [...] So you are using the order statistic of a given sample/data set. That's statistics with a notion...

Redesign `cut` and `qcut`

> Then there is the point of specifying which quantiles exactly. For the naming, I would at least look at what other libraries and languages do: None of the linked...

Redesign `cut` and `qcut`

@lorentzenchr Perhaps you find my argument more convincing if you consider that both '`cut`' and '`qcut`' are perfectly well-defined functions over series of *strings*.

Redesign `cut` and `qcut`

Another bikeshed suggestion: `bin` and `bin_sorted`?

Redesign `cut` and `qcut`

> A cut_rank or cut_partition would be fine and avoid some confusion. But to me, it is a different/additional function doing something different. > Modern gradient boosting libs like XGBoost,...

Redesign `cut` and `qcut`

Alright, proposed redesign, take two. We remove `cut` and `qcut` (which I honestly feel are pretty bad names anyway) in favor of three binning functions: ```python def bin_intervals( self, intervals:...

Redesign `cut` and `qcut`

> I'm unsure how an argument regarding the connection between quantities and probability led to the need for changing two existing function names, their parameter names, and their default values,...

Redesign `cut` and `qcut`

@xuJ14 > I think 'drop' should be added, since sometimes we do not want multiple values for the breakpoint. That's what first / last do, they just allow more flexibility...