Christian Lorentzen comments

Results 329 comments of


                                            Christian Lorentzen

GLM fit without penalty

That‘s great. Thank you for all your work!

FEA Confusion matrix derived metrics

From the long discussion in #5516, better ping @jnothman @amueller @glemaitre .

numpy.sum not stable enough sometimes (Kahan, math.fsum)

To add a simple example: ```python import numpy as np def naive_sum(a): result = np.zeros_like(a, shape=(1,)) for i in range(a.shape[0]): result[0] += a[i] return result[0] a = np.linspace(-100, 100, 1_000_000,...

[RFC] Leaf Level Variance in Multi Output Decision Trees

> For single output RFR trained with the squared error criterion the impurity of the leaves can be used as a crude but useful estimate of the aleatoric uncertainty. Very...

[RFC] Leaf Level Variance in Multi Output Decision Trees

What does scikit-learn think of adding the "option to store data samples in the leaf nodes", That’s a highly complex C/Cython surgery. I prefer to add such an uncertainty estimate...

BLAS-like kernel for symmetric updates in LDL factorization without pivoting

The original proposal to add an extension to `?syrk` to support `C += alpha A (D A)^T` for general A and diagonal D was once discussed on the [mailing list](https://groups.google.com/g/blis-devel/c/rUUuaHrvxDw/m/wI2TVrMEAwAJ)...

ENH allow to pass splitter for early stopping validation in HGBT

@adrinjalali see https://github.com/scikit-learn/scikit-learn/issues/18748#issuecomment-1949948090. We were ok with some possibility of data leakage. It's a tradeoff: either `X_val, y_val` are constructed with the same preprocessing as `X_train, y_train` (often desirable) or...

ENH allow to pass splitter for early stopping validation in HGBT

**TLDR:** I don't think information leakage is a real problem here. IIUC, you prefer the GridSearchCV version in https://github.com/scikit-learn/scikit-learn/issues/26359#issuecomment-1571838168. For the estimator considered here that would mean: ```python from sklearn.datasets...

ENH allow to pass splitter for early stopping validation in HGBT

> ```python > # telling pipeline to transform these inputs up to the step which is > # requesting them. > transform_input=["X_val", "y_val"], > ``` @adrinjalali This is better proposed...

[Discussion] efficiency improvements

> We have several speed optimizations in the 3.0.0 release. In particular, LightGBM also implement row-wise histogram algorithm (previous is column-wise), and may is faster with smaller #thread or smaller...