mlxtend icon indicating copy to clipboard operation
mlxtend copied to clipboard

StackingCVClassifier fails on pandas DataFrames

Open marketneutral opened this issue 6 years ago • 2 comments

I am attempting to use StackingCVClassifier where the base models are sklearn pipelines. The pipelines use sklearn.compose.ColumnTransformer and mlxtend.feature_selection.ColumnSelector. As such, when I call fit(...) I pass in a pandas DataFrame as my X (ColumnTransformer and ColumnSelector allow for named columns).

Calling fit fails with

/opt/conda/lib/python3.6/site-packages/mlxtend/classifier/stacking_cv_classification.py in fit(self, X, y, groups, sample_weight)
    239                 except KeyError as e:
    240 
--> 241                     raise KeyError(str(e) + '\nPlease check that X and y'
    242                                    ' are NumPy arrays. If X and y are pandas'
    243                                    ' DataFrames,\ntry passing them as'

KeyError: "'[    2     4     5 ... 31737 31738 31739] not in index'\nPlease check that X and y are NumPy arrays. If X and y are pandas DataFrames,\ntry passing them as X.values and y.values."

Can you advise on a workaround to be able to call StackingCVClassifier in this case?

marketneutral avatar Nov 07 '19 22:11 marketneutral

It seems that both ColumnSelector and ColumnTransformer allow one to pass in column indices. Thus, instead of

select = ColumnSelector(cols=lgb_cols)

you can do

select = ColumnSelector(cols=[train.columns.get_loc(c) for c in lgb_cols])

and then you can pass in train.values into StackingCVClassifier and the issue is solved. One thing which is very odd though is that I find the sklearn pipelines are much slower in this case.

marketneutral avatar Nov 07 '19 23:11 marketneutral

I am not completely sure if this is related, but maybe the recent change in response to #605 fixes this. I.e., it could have been an issue related to the input checking. If you are not using the latest version from the master branch, maybe try that one to see whether the workaround your described is still required or not.

You can install the latest version from the master branch via

pip install git+git://github.com/rasbt/mlxtend.git

rasbt avatar Nov 08 '19 02:11 rasbt