Thomas J. Fan

Results 255 comments of Thomas J. Fan

It looks like the two motivations for the `InputArray` is for: 1. Input feature names for model inspection, which pushes for defining `input_feature_names_`. 2. Allow estimators to treat columns differently....

Scikit-learn mostly treats a DataFrame as a "2D ndarray with column names". Only the `OrdinalEncoder` and `OneHotEncoder` treats the data frame as "a collection of 1D arrays". When scikit-learn's models...

> is it more a "fingers crossed that Pandas doesn't change this"? It's fingers crossed. I've seen a proposal for a 2D extension array, but I think there is a...

After discussing with @eapolinario , I like going with `--include` that can include directories or files. With https://github.com/flyteorg/flytekit/pull/2663, the automatic copy behavior copies over everything that is imported by the...

> If the user copies entire python packages they are iterating on into the tarball with this flag, are they expected to adapt the python path If we can get...

> It is hard to know if people want to browse an OCI registry. For sure github provides searches based on image names. As a user, as long as https://conda-forge.org/packages/...

Moving my comment from https://github.com/scikit-learn/scikit-learn/pull/25956/files#r1172771551 here regarding adding more methods to scikit-learn's compat layer: > Ultimately, we decided to add methods to the wrappers only when it is going to...

I do not have the bandwidth to review this PR. If @Micky774 has time, I think he will have the most recent context to review a cluster metric.

> Thomas, is this a proposal to add rng= without removing/deprecating random_state=? No. That proposal is to not use `rng` at all and continue to use `random_state` even when we...

For random forest, I used the same behavior in HistGradientBoosting, which adopted the behavior from LightGBM. @NicolasHug Do you have more context on the missing value behavior for HistGradientBoosting?