Christian Lorentzen

Results 329 comments of Christian Lorentzen

That‘s great. Thank you for all your work!

From the long discussion in #5516, better ping @jnothman @amueller @glemaitre .

To add a simple example: ```python import numpy as np def naive_sum(a): result = np.zeros_like(a, shape=(1,)) for i in range(a.shape[0]): result[0] += a[i] return result[0] a = np.linspace(-100, 100, 1_000_000,...

> For single output RFR trained with the squared error criterion the impurity of the leaves can be used as a crude but useful estimate of the aleatoric uncertainty. Very...

What does scikit-learn think of adding the "option to store data samples in the leaf nodes", That’s a highly complex C/Cython surgery. I prefer to add such an uncertainty estimate...

The original proposal to add an extension to `?syrk` to support `C += alpha A (D A)^T` for general A and diagonal D was once discussed on the [mailing list](https://groups.google.com/g/blis-devel/c/rUUuaHrvxDw/m/wI2TVrMEAwAJ)...

@adrinjalali see https://github.com/scikit-learn/scikit-learn/issues/18748#issuecomment-1949948090. We were ok with some possibility of data leakage. It's a tradeoff: either `X_val, y_val` are constructed with the same preprocessing as `X_train, y_train` (often desirable) or...

**TLDR:** I don't think information leakage is a real problem here. IIUC, you prefer the GridSearchCV version in https://github.com/scikit-learn/scikit-learn/issues/26359#issuecomment-1571838168. For the estimator considered here that would mean: ```python from sklearn.datasets...

> ```python > # telling pipeline to transform these inputs up to the step which is > # requesting them. > transform_input=["X_val", "y_val"], > ``` @adrinjalali This is better proposed...

> We have several speed optimizations in the 3.0.0 release. In particular, LightGBM also implement row-wise histogram algorithm (previous is column-wise), and may is faster with smaller #thread or smaller...