imodels icon indicating copy to clipboard operation
imodels copied to clipboard

Sample Weight Support?

Open kmedved opened this issue 2 years ago • 7 comments

Hello - thanks all for the very interesting looking package. The hierarchical shrinkage wrapper seems especially interesting/novel. I'm interested in whether it would be possible to add sample weight support to this package? For background, sample weights are a fairly typical part of many scikit-learn packages (e.g., RandomForestRegressor or HistGradientBoostingRegressor, etc...), and are passed via the fit call, e.g., model.fit(X_train, y_train, sample_weight = w_train).

The purpose of sample weights is to increase the weighting of rows/observations based on some external criteria, typically based around how the training data was gathered, e.g., if your data has different sensors of varying sensitivity, you may increase the sample weighting of certain sensors. Or alternatively if your data is aggregated in some form, then you can increase the weights based on the aggregation (e.g., weekly data with a weight of 7, daily data with a weight of 1, etc...).

In terms of implementation, it's typically as simple as multiplying the loss for each row by the sample weights, to increase the model's sensitivity to large weightings, although I'm not sure if the novel hierarchical shrinkage capabilities of this package would present complications.

Thanks again for the very interesting looking package. I look forward to testing and using it.

kmedved avatar Feb 08 '22 15:02 kmedved

Hi @kmedved 👋, thanks for your interest in the package! Indeed, supporting sample weight seems like it would be useful and especially interesting for hierarchical shrinkage - we'll add it in some time very soon :)

csinva avatar Feb 08 '22 17:02 csinva

An update: some of the models (but not all) now support sample_weight including FIGS, TAO, SLIM, CART, BoostedRules, SLIPPER, and SkopeRules. Still working on the others...

csinva avatar Jul 29 '22 03:07 csinva

Some parts of FIGS do not support sample_weight including the extract_sklearn_tree_from_figs() function.

mepland avatar Dec 29 '22 21:12 mepland

Thanks for the work on this @csinva. Any update on getting sample weight supported added for hierarchical shrinkage?

kmedved avatar Dec 30 '22 02:12 kmedved

@aagarwal1996 @yanshuotan Can someone add in sample-weight support for HS?

csinva avatar Dec 30 '22 15:12 csinva

Actually HS already supports sample weights. sample_weight is fed into self.estimator_.fit() as an element of kwargs. For instance, see the following snippet:

Screen Shot 2023-01-01 at 1 58 23 PM st))`

Furthermore, line 84 of the code uses weighted_n_node_samples to do shrinkage. When the original tree estimator is fit, it stores the weighted number of nodes in this array.

I do agree that it may be beneficial to make sample_weight an explicit (optional) argument into fit. @csinva what do you think?

yanshuotan avatar Jan 01 '23 06:01 yanshuotan

Agreed, thanks Yan Shuo for adding HS sample_weight as an explicit argument in https://github.com/csinva/imodels/pull/156.

Should work now @kmedved!

csinva avatar Jan 03 '23 16:01 csinva