KeyError [...77789,77790] index not found error when I put AutoML in sklearn pipeline and run it
Code:
categorical_transformer = Pipeline(steps=[('one_hot', OneHotEncoder())]) categorical_features = ['merchant_category', 'merchant_group',"name_in_email"]
preprocessor = ColumnTransformer( transformers=[ ('cat', categorical_transformer, categorical_features) ])
clf = Pipeline(steps=[('missing', fill_missing()), ('outlier', outlier_filling()), ('preprocessor', preprocessor), ('classifier', AutoML())])
clf.fit(X_train, y_train)
Note: It works when RandomForestClassifier is replaced with AutoML.
@busekoseoglu I can't reproduce this problem with my synthetic data for testing. Could you please share an example dataset to reproduce this problem? BTW, you don't have to use one hot encoding before AutoML.fit(). It often works better without this encoding.
Of course, I am attaching an example csv file. I ran it without One hot encoding but I'm wondering if it will work as well sampledf.csv .
This works for me:
from flaml import AutoML
import pandas as pd
df = pd.read_csv("https://github.com/microsoft/FLAML/files/8496779/sampledf.csv")
X = df.drop(columns="has_paid")
y = df["has_paid"]
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
categorical_transformer = Pipeline(steps=[('one_hot', OneHotEncoder())])
categorical_features = ['merchant_category', 'merchant_group',"name_in_email"]
preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_features)
])
clf = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', AutoML())])
clf.fit(X, y)
I removed the first two steps in your pipeline because they are undefined.
('missing', fill_missing()), ('outlier', outlier_filling()),