mlxtend icon indicating copy to clipboard operation
mlxtend copied to clipboard

Add class weights to the EnsembleVotingClassifier

Open rp1908 opened this issue 9 years ago • 3 comments

I used gradient boosting classifier to build a classification model. I am trying to improve the model by using a stack up model. I want to ensemble 3 different models, let's say, gbm, randomforests, logistic regression (except for gbm, other models subject to change). In my GBM model, I used weights in the fit function by giving higher weights to positive target variable. I want too try the same thing in ensemble, but I am unable to figure out how to implement weights in the source code of the ensemblevotingclassifier. I am new to this, so would like to receive suggestions regarding implementation of weights

Thanks

rp1908 avatar Aug 01 '16 12:08 rp1908

Hi, via the weights parameter of the classifier only controls the weights of the models (e.g,. the GBM, random forest, and logistic regression). To clarify ... when you say "positive target variable", do you mean a high value of your performance metric, or the "class label 1" in a binary classification setting?

While you have to tune the weights in the EnsembleVoteClassifier manually, the i.e., via hyperparameter tuning such as grid search or random search, there's also an alternative approach called Stacking. Since you mentioned that you want to

using a stack up model

you are maybe interesting in the Stacking alternative? I've implemented the simple version in mlxtend, and the documentation can be found here: http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/

However, note that this is the original StackingClassifier version, not the cross-validation variant (for more info: https://github.com/rasbt/mlxtend/issues/29). I was planning to add a StackingCVClassifier one day, though.

PS: I am closing this "issue" because it doesn't seem to be a bug in mlxtend, but please feel free to comment further in this thread.

rasbt avatar Aug 01 '16 22:08 rasbt

Hi Sebastian,

Thanks for your response. I wasn't sure how to ask the question, hence I used "issue" to ask it.

By weights (enriching the target variable in a biased sample), I mean assigning higher weight to class variable 1 in the target variable. For example, if there are 10 class 1 target variables and 90 class 0 target variables in a dataset, I would be assigning a weight of 9 to class 1 and a weight of 1 to class 0. This is possible in GBM or logistic by using sample_weights argument in the fit function. But, this is not possible in the ensemblevoteclassifier as I see in the source code. In the source code of gradient boosting classifier, this weights is present in the source code of the fit function. Do you think it would be possible to incorporate such weights in ensemblevoteclassifier too? I'd really appreciate any help I can receive.

Regards, Ruthvik

On Tue, Aug 2, 2016 at 3:39 AM, Sebastian Raschka [email protected] wrote:

Hi, via the weights parameter of the classifier only controls the weights of the models (e.g,. the GBM, random forest, and logistic regression). To clarify ... when you say "positive target variable", do you mean a high value of your performance metric, or the "class label 1" in a binary classification setting?

While you have to tune the weights in the EnsembleVoteClassifier manually, the i.e., via hyperparameter tuning such as grid search or random search, there's also an alternative approach called Stacking. Since you mentioned that you want to

using a stack up model

you are maybe interesting in the Stacking alternative? I've implemented the simple version in mlxtend, and the documentation can be found here: http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/

However, note that this is the original StackingClassifier version, not the cross-validation variant (for more info: #29 https://github.com/rasbt/mlxtend/issues/29). I was planning to add a StackingCVClassifier one day, though.

PS: I am closing this "issue" because it doesn't seem to be a bug in mlxtend, but please feel free to comment further in this thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rasbt/mlxtend/issues/80#issuecomment-236723208, or mute the thread https://github.com/notifications/unsubscribe-auth/ATz2TIHHlZkF8vJScXAqENc608sw69NPks5qbm6KgaJpZM4JZiF- .

rp1908 avatar Aug 02 '16 08:08 rp1908

Oh I see, based on the terms you used, I would have assumed that you meant the class labels :) One way to go about it would be to use the class_weights of the respective classifier implementations, e.g., the LogisticRegression and RandomForestClassifier in scikit-learn have one, too. But adding it to the EnsembleVoteClassifier would also be a nice enhancement (and the results may be a bit different from setting the class weights in the initial classifiers). I think it would not be too complicated to add. It currently uses the weights as an optional weighting of the individual classifiers, i.e.,

            maj = np.apply_along_axis(lambda x:
                                      np.argmax(np.bincount(x,
                                                weights=self.weights)),
                                      axis=1,
                                      arr=predictions)

As an alternative to weight on class labels, one could just multiply the predictions I guess. E.g.,

if we have 3 classifiers that predict the three samples as [0, 1, 1, 0, 1] and you say that class 0 should be weighted twice as much as class 1, then it would become [0, 0, 1, 1, 0, 0, 1] from which you then can select the majority class label as a prediction.

I think that may be a useful feature, so please feel free to submit a pull request if you like!

rasbt avatar Aug 04 '16 07:08 rasbt