interpret icon indicating copy to clipboard operation
interpret copied to clipboard

sample_weight in Explainable Boosting Machine (EBM)

Open SoulEvill opened this issue 5 years ago • 4 comments

I am not seeing sample_weight available in Explainable Boosting Machine, is there any easy way to implement this? any direction would be really appreciated, thanks a lot!

SoulEvill avatar Jul 14 '20 07:07 SoulEvill

Hi @SoulEvill - please check the answer to issue #115

lucas-a-meyer avatar Jul 30 '20 12:07 lucas-a-meyer

Thanks @RealLucasMeyer! Yes, sample_weight support for EBM is also being tracked in #115 .

interpret-ml avatar Aug 17 '20 17:08 interpret-ml

(Reposting from issue #62): The latest release of interpret (0.2.5) now has support for sample weights in ExplainableBoostingMachines.

You can pass in positive floating point weights to the new sample_weight parameter of the ebm.fit() call. sample_weight should be the exact same shape and dimension as y -- one weight per sample. Here's a quick usage example:


from interpret.glassbox import ExplainableBoostingRegressor

ebm = ExplainableBoostingRegressor()
ebm.fit(X, y, sample_weight=w) 

You can also see more in our documentation: https://interpret.ml/docs/ebm.html#explainableboostingclassifier

To upgrade interpret using pip: pip install -U interpret

Let us know if you run into any issues! -InterpretML Team

interpret-ml avatar Jun 23 '21 21:06 interpret-ml

This is great! Thanks @interpret-ml !

From my experiments on heavily imbalanced data, BalancedRandomForestClassifier with its balanced_subsample mode does perform better than EBM. Is there any way possible to compute weights based on the bootstrap sample for every tree grown similar to balanced_subsample mode?

bwang482 avatar Jun 26 '21 17:06 bwang482

Closing this issue as the sample_weight aspect has been resolved.

Regarding making balanced subsamples, I believe you can already achieve this using the existing interface. We use stratified sampling for classification, so every bag should have the same number of samples in each class. You can set the per-class weight through the sample_weight parameter by setting the sample weights according to their label. If there's something I'm missing that deserves a longer discussion, please open a new issue for separate tracking.

paulbkoch avatar Jan 22 '23 20:01 paulbkoch