Christian Lorentzen comments

Results 319 comments of


                                            Christian Lorentzen

trafficstars

Redesign `cut` and `qcut`

To be honest, I really do not like this discussion. There is the issue of having a separate parameter to say: "Hello `qcut`, I just want 5 equally spaced quantiles."...

A `cut_rank` or `cut_partition` would be fine and avoid some confusion. But to me, it is a different/additional function doing something different. Modern gradient boosting libs like XGBoost, LightGBM and...

Redesign `cut` and `qcut`

> @lorentzenchr Perhaps you find my argument more convincing if you consider that both '`cut`' and '`qcut`' are perfectly well-defined functions over series of _strings_. As long as strings have...

Redesign `cut` and `qcut`

> What would be the quantile with probability level 0.5 for the sample ["bar", "baz", "foo", "moo"]? We have `"bar" < "baz" < "foo" < "moo"`, therefore the 50%-quantile is...

[MAINT] Remove numba as required dependency

If I understand correctly, the sparse package is used to provide the "array_backend" for the triangle data (`triangle.values` array). How about making the sparse backend optional? (if sparse is installed...

fix: use log_wright_bessel to fix overflow

Which test is failing?

ENH allow up to 65536 bins in HGBT

> > Disadvantages: > > > > In case we want to add row-wise histogram computation, this becomes more complicated as BinnedData is inherently F-contiguous and it is harder to...

ENH allow up to 65536 bins in HGBT

Please do not commit to this branch. I‘ll update it soon. For reviewers, focus entirely on the changes in the ensemble folder, in particular common.pxd and common.pyx. Edit: The finding...

ENH allow up to 65536 bins in HGBT

Very strange. There is now quite a performance regression. While up to and with https://github.com/scikit-learn/scikit-learn/pull/28603/commits/91608fbd0b43ac7ae2c6118ccaff849d9604aab9 everything is fine ``` % python bench_hist_gradient_boosting_categorical_only.py --n-samples 100_000 --verbose [99/100] 1 tree, 31 leaves,...

ENH allow up to 65536 bins in HGBT

It would actually help me if someone could confirm the performance regression of https://github.com/scikit-learn/scikit-learn/pull/28603/commits/0107e777b3dfd12f8bcc5c3337bd39a6877a7026, i.e. run ``` python bench_hist_gradient_boosting_categorical_only.py --n-samples 100_000 --verbose ``` on - branch main - on this...