mlxtend icon indicating copy to clipboard operation
mlxtend copied to clipboard

Support fit_params for cross_val_score in StackingClassifier

Open agzamovr opened this issue 8 years ago • 4 comments

Is it possible to pass fit params for individual classifiers? I tried to pass fit params for XGBClassifier but got error. My code is following:

clf1 = xgbooster()
clf2 = linear_svc()
lr = LogisticRegression()
fit_params = {
    'xgbclassifier__eval_metric': 'mlogloss',
    'xgbclassifier__eval_set': [(X_test, y_test)],
    'xgbclassifier__early_stopping_rounds': 100,
    'xgbclassifier__verbose': False}
estimator = StackingClassifier(classifiers=[clf1, clf2],
                               meta_classifier=lr)
mean_score = cross_val_score(estimator=estimator,
                             X=X_train,
                             y=y_train,
                             scoring='neg_log_loss',
                             cv=5, 
                             verbose=5, 
                             fit_params=fit_params,
                             n_jobs=-1).mean()

agzamovr avatar Apr 16 '17 07:04 agzamovr

Hi @agzamovr ,

Is it possible to pass fit params for individual classifiers?

Oh yes, of course :). I have an example here: http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/#example-3-stacked-classification-and-gridsearch

The syntax for accessing estimator params is similar to the one used by make_pipeline in scikit-learn. E.g.,

from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

pipe = make_pipeline(StandardScaler(), LogisticRegression())

params = {'logisticregression__C': [0.1, 1., 10.]}

grid = GridSearchCV(estimator=pipe, 
                    param_grid=params, 
                    cv=5)
grid.fit(X, y)

(Essentially, it is just lowercasing the class name. If you have multiple objects from the same class, it would enumerate them, e.g., 'logisticregression-1', 'logisticregression-2', etc.

So, looking at your code above, it looks like you have a small typo, and it should be 'xgbooster' in

fit_params = {
    'xgbclassifier__eval_metric': 'mlogloss',
...

PS: If in doubt what the actual parameter names are, you could get a list via

estimator = StackingClassifier(classifiers=[clf1, clf2],
                               meta_classifier=lr)

estimator.get_params().keys()

Let me know if it solves the problem!

rasbt avatar Apr 16 '17 15:04 rasbt

Thank you for response! xgbooster is not a class name, it's a method which creates xgboost.XGBClassifier class instance. For this reason i used xgbclassifier__ prefix. In your example you use param_grid parameter, but GridSearchCV also has fit_params paramter. param_grid is used for creating estimator, while fit_params is used when calling fit paramter.

agzamovr avatar Apr 16 '17 16:04 agzamovr

Oh I see what you mean now. I don't know how I could have misread your issue so badly :P.

Yeah, unfortunately, this doesn't work, yet. But I guess it shouldn't be too hard to add this features; it could be pretty useful imho

rasbt avatar Apr 16 '17 17:04 rasbt

Hi @rasbt , I've encountered a problem when using GridSearchCV and cross_val_score. I embedded GridSearchCV (inner cv) in cross_val_score (outer cv) according to the method written in your book (Python Machine Learning) so that my training set is divided into inner and outer part. Inner part is used to tune hyper-parameters while outer part is used to evaluate models with the best hyper-paramters. But how to pass fit parameters for GridSearchCV rather than the model? Seems like fit_params is only for the fit parameters of models, right? Looking forward to your reply, thank you very much!

fredsamhaak avatar May 13 '21 10:05 fredsamhaak