cuml icon indicating copy to clipboard operation
cuml copied to clipboard

[FEA] Support `feature_names_in_` attribute

Open tvdboom opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. To be able to use cuml estimators as a sklearn drop-in replacement, they should have the same attributes. One often used in pipelines is feature_names_in_, that contains the names of the features seen during fit (when provided in a pd.dataframe or cupy.dataframe)

Describe the solution you'd like Support for all cuml estimators to have the feature_names_in_ attribute after fit. Currently, only n_features_in_ is supported.

from sklearn.datasets import load_breast_cancer
from cuml.preprocessing import StandardScaler

X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

scaler = StandardScaler().fit(X)
print(scaler.n_features_in_)  # Works
print(scaler.feature_names_in_)  # AttributeError

Implementing this could potentially help with #5564

tvdboom avatar Nov 29 '23 10:11 tvdboom

Hello! Would you like to move forward with this issue? Or will it be okay if I start working on the feature?

jinsolp avatar May 21 '24 21:05 jinsolp

Feel free to work on it!

tvdboom avatar May 22 '24 05:05 tvdboom