Christian Lorentzen
Christian Lorentzen
To be honest, I really do not like this discussion. There is the issue of having a separate parameter to say: "Hello `qcut`, I just want 5 equally spaced quantiles."...
A `cut_rank` or `cut_partition` would be fine and avoid some confusion. But to me, it is a different/additional function doing something different. Modern gradient boosting libs like XGBoost, LightGBM and...
> @lorentzenchr Perhaps you find my argument more convincing if you consider that both '`cut`' and '`qcut`' are perfectly well-defined functions over series of _strings_. As long as strings have...
> What would be the quantile with probability level 0.5 for the sample ["bar", "baz", "foo", "moo"]? We have `"bar" < "baz" < "foo" < "moo"`, therefore the 50%-quantile is...
If I understand correctly, the sparse package is used to provide the "array_backend" for the triangle data (`triangle.values` array). How about making the sparse backend optional? (if sparse is installed...
Which test is failing?
> > Disadvantages: > > > > In case we want to add row-wise histogram computation, this becomes more complicated as BinnedData is inherently F-contiguous and it is harder to...
Please do not commit to this branch. I‘ll update it soon. For reviewers, focus entirely on the changes in the ensemble folder, in particular common.pxd and common.pyx. Edit: The finding...
Very strange. There is now quite a performance regression. While up to and with https://github.com/scikit-learn/scikit-learn/pull/28603/commits/91608fbd0b43ac7ae2c6118ccaff849d9604aab9 everything is fine ``` % python bench_hist_gradient_boosting_categorical_only.py --n-samples 100_000 --verbose [99/100] 1 tree, 31 leaves,...
It would actually help me if someone could confirm the performance regression of https://github.com/scikit-learn/scikit-learn/pull/28603/commits/0107e777b3dfd12f8bcc5c3337bd39a6877a7026, i.e. run ``` python bench_hist_gradient_boosting_categorical_only.py --n-samples 100_000 --verbose ``` on - branch main - on this...