mlxtend StackingCVClassifier fails on pandas DataFrames

I am attempting to use StackingCVClassifier where the base models are sklearn pipelines. The pipelines use sklearn.compose.ColumnTransformer and mlxtend.feature_selection.ColumnSelector. As such, when I call fit(...) I pass in a pandas DataFrame as my X (ColumnTransformer and ColumnSelector allow for named columns).

Calling fit fails with

/opt/conda/lib/python3.6/site-packages/mlxtend/classifier/stacking_cv_classification.py in fit(self, X, y, groups, sample_weight)
    239                 except KeyError as e:
    240 
--> 241                     raise KeyError(str(e) + '\nPlease check that X and y'
    242                                    ' are NumPy arrays. If X and y are pandas'
    243                                    ' DataFrames,\ntry passing them as'

KeyError: "'[    2     4     5 ... 31737 31738 31739] not in index'\nPlease check that X and y are NumPy arrays. If X and y are pandas DataFrames,\ntry passing them as X.values and y.values."

Can you advise on a workaround to be able to call StackingCVClassifier in this case?

Nov 07 '19 22:11 marketneutral

It seems that both ColumnSelector and ColumnTransformer allow one to pass in column indices. Thus, instead of

select = ColumnSelector(cols=lgb_cols)

you can do

select = ColumnSelector(cols=[train.columns.get_loc(c) for c in lgb_cols])

and then you can pass in train.values into StackingCVClassifier and the issue is solved. One thing which is very odd though is that I find the sklearn pipelines are much slower in this case.

Nov 07 '19 23:11 marketneutral

I am not completely sure if this is related, but maybe the recent change in response to #605 fixes this. I.e., it could have been an issue related to the input checking. If you are not using the latest version from the master branch, maybe try that one to see whether the workaround your described is still required or not.

You can install the latest version from the master branch via

pip install git+git://github.com/rasbt/mlxtend.git

Nov 08 '19 02:11 rasbt