FLAML icon indicating copy to clipboard operation
FLAML copied to clipboard

KeyError [...77789,77790] index not found error when I put AutoML in sklearn pipeline and run it

Open busekoseoglu opened this issue 3 years ago • 3 comments

Code:

categorical_transformer = Pipeline(steps=[('one_hot', OneHotEncoder())]) categorical_features = ['merchant_category', 'merchant_group',"name_in_email"]

preprocessor = ColumnTransformer( transformers=[ ('cat', categorical_transformer, categorical_features) ])

clf = Pipeline(steps=[('missing', fill_missing()), ('outlier', outlier_filling()), ('preprocessor', preprocessor), ('classifier', AutoML())])

clf.fit(X_train, y_train)

Note: It works when RandomForestClassifier is replaced with AutoML.

busekoseoglu avatar Apr 14 '22 16:04 busekoseoglu

@busekoseoglu I can't reproduce this problem with my synthetic data for testing. Could you please share an example dataset to reproduce this problem? BTW, you don't have to use one hot encoding before AutoML.fit(). It often works better without this encoding.

sonichi avatar Apr 14 '22 17:04 sonichi

Of course, I am attaching an example csv file. I ran it without One hot encoding but I'm wondering if it will work as well sampledf.csv .

busekoseoglu avatar Apr 15 '22 15:04 busekoseoglu

This works for me:

from flaml import AutoML
import pandas as pd

df = pd.read_csv("https://github.com/microsoft/FLAML/files/8496779/sampledf.csv")
X = df.drop(columns="has_paid")
y = df["has_paid"]
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
categorical_transformer = Pipeline(steps=[('one_hot', OneHotEncoder())])
categorical_features = ['merchant_category', 'merchant_group',"name_in_email"]

preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_features)
])

clf = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', AutoML())])

clf.fit(X, y)

I removed the first two steps in your pipeline because they are undefined.

('missing', fill_missing()), ('outlier', outlier_filling()),

sonichi avatar Apr 15 '22 16:04 sonichi