interpret Monotone models

Hi!

Are there plans to implement monotonic regressors, like it is possible for LightGBM, for example?

Thank you!

Best Robert

Nov 22 '20 22:11 Garve

Hi @Garve,

Thanks for bringing this up, and sorry for our delay in getting back to you. We completely agree that the ability to enforce monotonicity would be a nice addition for EBMs, but we haven't had time to do it. There are a few different ways to implement this -- we can enforce it during training, like LightGBM, or we can provide options to enable it as a post-processing step on a trained EBM model.

When enforcing monotonicity during boosting, we've noticed that models tend to take advantage of correlated features to bypass the constraint. Enforcing monotonicity as a post-processing step might be more ideal, but it still requires further investigation for our model class. One way we've done this in the past is by applying isotonic regression (the Pool-Adjacent-Violators Algorithm or PAV) on the graphs that need to be monotonic.

We'll leave this issue open to track the demand for this feature, and will update this thread once we've made some progress on the research or implementation sides. If anyone would like to discuss this further or help out on this feature, we'd be happy to talk with you!

Thanks! -InterpretML Team

Feb 05 '21 21:02 interpret-ml

Hello @interpret-ml team!

Thanks for the answer :) I would say that the enforcement during the training makes more sense. Doing it after the training only alters what the model is actually saying, right? As a very naive approach, I could use a max(0, [model output]). Then the model would say -, but we make it a 0. Feels kind of hacky to me.

The direct approach might have some issues with correlations, but these problems are always there, no? We can create a dataset X, y and insert a copy of some column of X into X again.

import numpy as np
from interpret.glassbox import ExplainableBoostingRegressor
from interpret import show

X = np.random.randn(10000, 2)
X = np.hstack([X, X[:,[0]]]) # insert copy

y = X[:, 0] + X[:, 1] 

ebm = ExplainableBoostingRegressor(interactions=False)
ebm.fit(X, y)

ebm_global = ebm.explain_global(name='EBM')
show(ebm_global)

The ExplainableBoostingRegressor also can't tell if feature 1 or feature 3 is more important. Both are even half as important as feature 2. This is also a problem due to correlation. Therefore, I think that the users should take care of correlation problems themselves.

What are your thought about this?

Thank you very much! :)

Best regards Robert

Feb 09 '21 15:02 Garve

I agree with Garve here. I feel the model will be more accurate if we train it on a monotonic Xn. At least this behaviour will be "understood" by the model and thus taken into account when boosting (possibly even learned by another (co-)variable).

I am currently doing a lot of research on the use of EBMs/GBMs to find heat coefficients and change-points in gas data when compared with outside air temperatures and other weather and non-weather variables. See here for some examples using piece-wise linear regression on the univariate case of temperature. I have also managed to recover change-points and some crude heating coefficients from the EBM models as well, but only when the data is very well behave, or a good deal of care is taken cleaning it before hand. I was planning to do a detailed write-up on this, and propose a python notebook example on how to treat the model after training, but it seems this is a good time to raise one of my findings/thoughts on monotonicity:

In the post-processing, one of the issues is, if there is a sizeable negative step, then in the monotonic increasing case, a smoother doesn't know which way to smooth it. At the x = 22.5 mark here, we see that a single anomalous reading has caused an undesired just in the final level. this means all values x > 23 predict approximately 5 units too high.

This is also true at the far left side of the graph where x = 0 should give y ~ 0.

I am now experimenting with weighted smoothing as a post-processing step, however, it seems rather more tricky and requires the original training data. Thus, it seems better to treat this at training time!

FYI - to save potential confusion, this case is not another variable vs gas, but one that is KNOWN to be monotonically increasing (vs temperature which is decreasing).

Feb 09 '21 19:02 JoshuaC3

Hi again!

I implemented a very naive proof-of-concept version of an ExplainableBoostingMetaRegressor that takes any base regressor as input, see here on my Github. I can even give each feature an own base regressor.

Is it an option to implement it like this, just more efficiently? :D

To come back to the original problem: If I want monotonically increasing behavior in some features, I can give it an IsotonicRegression() from scikit-learn. If I want it decreasing, I give it an IsotonicRegression(increasing=False). If I need positive values, I can give it a IsotonicRegression(y_min=0) etc.

If I don't specify anything, it uses a DecisionTree with some small depth. Seems to work well!

Again a word of caution: The regressor seems to work, but I didn't test it too much too far. It's also not really efficient and doesn't work together with the show() function of interpret. It also doesn't support interactions so far. You can, however, get the nice graphs using the output_ attribute, i.e.

e = ExplainableBoostinMetaregressor()
e.fit(X, y)

for i in range(len(X)):
    plt.plot(e.domains_[i], e.outputs_[i])
    plt.title(i)
    plt.show()

I also didn't check how you guys implemented it, I just checked out this youtube video of how the algorithm works at a high level and tried to replicate this in code.

Best regards Robert

Mar 01 '21 09:03 Garve

I also have a need for monotonicity, and would prefer to have it enforced during training. I don't really have a solution other than those that have been mentioned, but posting to +1 the column of demand. David

Apr 15 '21 18:04 paulsendavidjay

Hi David, Do you need monotonicity for just for one or a few variables, or for all variables?

Apr 15 '21 20:04 richcaruana

Hi Rich, I think we've talked before on this topic some months ago. I would need monotonicity on all features, ultimately, as I'm in a regulated space.

Apr 15 '21 20:04 paulsendavidjay

@richcaruana all of the above. Something like, monotonic = None/0 implies no constraints monotonic = 1 implies all increasing monotonic = - 1 implies all decreasing monotonic = [1, 0, 1, - 1, 0] implies [increasing, no constraint, increasing, decreasing, no constraint] for X with shape of 5.

Apr 15 '21 21:04 JoshuaC3

@paulsendavidjay: thanks for reminding me of our previous discussion. Completely agree with you that if you need monotonicity on all features then the best way to achieve that is via constraints imposed during training. Not sure how quickly we'll have that implemented, but it is on our radar.

@JoshuaC3: the interface you suggest (-1 = decreasing, 0 - no constraint, +1 = increasing) makes sense. Adding constraints to only a subset of features doesn't always achieve the effect you want. If there is no correlation among features, then imposing constraints per feature works exactly as you would expect, but in the usual case where there is correlation among features learning will do everything it can to get around the monotonicity constraints while still appearing to be monotone on the features you constrained. For example, imagine you have two copies of a feature (but aren't aware of it) and put a monotonicity constraint on one of the features, but not both. The model will satisfy the constraint on the feature you apply the constraint to, but will use the other copy of that feature which is unconstrained to undo what it has learned on the constrained feature, so in the end it is not correct to think of the model as being monotone on the constrained features since the model has used correlation among the features to undo that monotonicity. There are almost always many correlations among features in complex datasets, so this is a real problem and makes applying monotonicity constraints to subsets of features problematic. And this is a problem with monotonicity constraints for all learning methods, not just EBMs. At least the effects are more visible with glassbox methods like ours.

Apr 19 '21 18:04 richcaruana

@richcaruana I should have said, it is the interface used by LightGBM, CatBoost and XGBoost.

I hadn't considered the colinearity effects for monotonic constraints in general here - what an excellent insight!! That said, I don't think it should cause many issues. Checking colinearity is something an ML practitioner should check as part of EDA/train-test-split/feature selection as standard. Additionally, a domain expert of type who is likely to set monotonic constraints should understand which of his independent variables are monotonically correlated with the dependent variable and with one-another.

In my main use case, the latter is certainly true. I know from the physics of the system I am predicting that the independent variables are all either positive or negative monotonic. I intend to share my use case at some as I feel it will be interesting and stimulate the discussion further!

Your last point is very pertinent - the fact that this is a glassbox model and has the rest of the interpret toolkit (counterfactuals etc) allows you to understand if/when this behaviour occurs. This is EXACTLY why I wish to use EBM over some of the more established GBMs with monotonic constraints! :D

Apr 19 '21 20:04 JoshuaC3

I was considering @richcaruana's above concern: colinear, correlated or highly descriptive independent variables. Depending on the application and the reason for wanting to constrain some variable to be monotonic, including 2nd order terms could cause problems.

Some idea for how to control for this would be as follows:

Exclude constrained variable from 2nd order features: monotonic_second_order='exclude'.
If both are 1 then have the 2nd order as 1. If both are -1, have the 2nd order as -1. A mix 0, 1 or -1, 0 could then be strictly constrained 1 and -1 respectively. Finally, 1 and -1 would be 0: monotonic_second_order='strict'.
Or, as above but with the mixed case 0, 1 and -1, 0 being weakly constrained 0 and 0: monotonic_second_order='weak'.
Finally, ignore the constraints on the variables: monotonic_second_order='ignore'.

Apr 19 '21 21:04 JoshuaC3

Hi @Garve, @JoshuaC3, and @paulsendavidjay,

Thanks for the spirited discussion around this! Wanted to add to this thread with some utility code that post-processes any main effect graph to enforce monotonicity (after training):


from sklearn.isotonic import IsotonicRegression
from copy import deepcopy
import plotly.graph_objects as go
import numpy as np

def make_monotone(ebm, feature, direction='auto', inplace=False, visualize_changes=True):
    ''' Adjusts an individual feature to be monotone using isotonic regression. 
    
        Args:
            ebm: Fitted ExplainableBoostingClassifier or ExplainableBoostingRegressor.
            feature: Index or name of continuous univariate feature to apply monotone constraints
            direction: 'auto', 'increasing' or 'decreasing'. Auto decides sign based on Spearman correlation estimate.
            inplace: If True, modifies existing EBM in place. If False, returns new EBM.
            visualize_changes: Produces Plotly visualization highlighting edits.
            
        Returns:
            If not inplace, returns new EBM with monotonicity constraints.      
    '''
    if isinstance(feature, str): # Find feature index if passed as string
        feature_index = ebm.feature_names.index(feature)
    else:
        feature_index = feature
    
    x = np.array(range(len(ebm.additive_terms_[feature_index])))
    y = ebm.additive_terms_[feature_index]
    w = ebm.preprocessor_.col_bin_counts_[feature_index]

    # Fit isotonic regression weighted by training data bin counts
    direction = 'auto' if direction not in ['increasing', 'decreasing'] else direction == 'increasing'
    ir = IsotonicRegression(out_of_bounds="clip", increasing=direction)
    y_ = ir.fit_transform(x, y, sample_weight=w)
    
    ebm_mono = deepcopy(ebm)
    ebm_mono.additive_terms_[feature_index][1:] = y_[1:]
    
    # Plot changes to model
    if visualize_changes:
        ebm_global = ebm.explain_global()
        trace = ebm_mono.explain_global().visualize(feature_index)
        trace['data'][1]['line']['color'] = 'red'
        trace['data'][1]['name'] = "Monotone"

        source_layout = ebm_global.visualize(feature_index)['layout']
        source_data = list(ebm_global.visualize(feature_index)['data'])
        source_data = [source_data[index] for index, trace in enumerate(source_data) 
                       if trace.name in ["Main", "Distribution"]]
        source_data[0]['fill'] = None
        source_data.append(trace['data'][1])
        source_layout['showlegend'] = True

        fig_mono = go.Figure(
            data=source_data,
            layout=source_layout
        )

        fig_mono.show()

    # Modify in place or return copy
    if inplace:
        ebm.additive_terms_[feature_index][1:] = y_[1:]
    else:
        return ebm_mono

Here's a quick usage example:

modifed_ebm = make_monotone(ebm, feature='Age', direction='auto', inplace=False, visualize_changes=True)

which produces a new EBM and the following visualization (if visualize_changes=True) highlighting the changes made to the model. You can also modify an existing EBM in place with the inplace flag.

This function isn't fully featured or tested yet, but we wanted to share it here first to provide a temporary solution and get feedback. As @JoshuaC3 points out, this also may not enforce true monotone constraints when pairwise interactions containing the feature are present -- maybe we should throw a warning in those cases, or explore ways to postprocess constraints on pairwise interaction terms?

We don't intend for this to be a replacement for monotone constraints at training time, but it could be a nice supplemental utility function for the cases where montonicity via post-processing makes sense. It'd be useful for us to hear if this function works on your problems as we work on training time constraints!

-InterpretML Team

Apr 19 '21 23:04 interpret-ml

@interpret-ml Very nice! I had spent some time a while ago looking at just such a post-processing method but was having difficulties with accessing the right data given my unfamiliarity with the objects, and had to drop it to work on other business items. This is a great solution that could applied to many business cases, with a clear visualization of the trade off. Thank you for such a quick turnaround!

Apr 19 '21 23:04 paulsendavidjay

From the above code I get the following error: AttributeError: 'EBMPreprocessor' object has no attribute 'col_bin_counts_'

modifying the code by replacing 'col_bin_counts_' with 'col_bin_edges_' and looks good!

Apr 19 '21 23:04 paulsendavidjay

Hi @paulsendavidjay,

Same to you -- thanks for the quick feedback! It's a bit surprising that your EBMPreprocessor doesn't have the col_bin_counts_ attribute exposed. Any chance you can check what version of interpret you're on? 0.2.4 (our latest release) should have support for this.

From the command line: pip show interpret

or in a python environment:

import interpret
interpret.__version__

should both show the version number. If you can upgrade, pip install -U interpret should do the trick. It won't make a big difference, but using the counts instead of the edges for weighting the isotonic regression would help the algorithm make better tradeoffs. Thanks again for testing it out so quickly!

Apr 20 '21 00:04 interpret-ml

Having given some further thought to the discussion here, I have raise the above issue. I think this would address some of the fears we had around constrained variables when used in 2nd order features, as well as 2nd order features in regulated spaces.

Apr 21 '21 12:04 JoshuaC3

Hi @interpret-ml:

Are we still working to add monotonic constraints during training to the algorithm, please? It would be great if this feature can be implemented since domain knowledge is crucial when a model is being used practically.

Thank you.

May 18 '21 21:05 flippercy

Hi @interpret-ml I second @flippercy's comment. I am working in the insurance industry, and monotonic constraints are very important. Do we plan to add this to EBM soon?

Thank you.

Jun 24 '21 10:06 huanvo88

Hey @huanvo88, I plan to work on EBM monotonicity through post-processing.

Just curious, are there laws or regulations that require insurance companies to use monotone ML models? If so, could you please point me to some related documents?

Jun 24 '21 13:06 xiaohk

Hi @xiaohk , I think insurance in Canada is more regulated, and I am not dealing with filing so I don't have any legal documents to give you. But sometimes when we present the models to the business, they would require certain features to be increasing or decreasing. From the discussion on this thread it seems it is better to incorporate the constraint in the fitting (like XGBoost or Lightgbm) rather than a post processing, but we can use post processing if there is no better alternative.

Jun 24 '21 13:06 huanvo88

Got it @huanvo88 , thanks!

If you only want certain features (not all) to be increasing or decreasing, post-processing might be a better solution than monotonic constraint. You can see https://github.com/interpretml/interpret/issues/184#issuecomment-822702385

paulsendavidjay: thanks for reminding me of our previous discussion. Completely agree with you that if you need monotonicity on all features then the best way to achieve that is via constraints imposed during training. Not sure how quickly we'll have that implemented, but it is on our radar.

JoshuaC3: the interface you suggest (-1 = decreasing, 0 - no constraint, +1 = increasing) makes sense. Adding constraints to only a subset of features doesn't always achieve the effect you want. If there is no correlation among features, then imposing constraints per feature works exactly as you would expect, but in the usual case where there is correlation among features learning will do everything it can to get around the monotonicity constraints while still appearing to be monotone on the features you constrained. For example, imagine you have two copies of a feature (but aren't aware of it) and put a monotonicity constraint on one of the features, but not both. The model will satisfy the constraint on the feature you apply the constraint to, but will use the other copy of that feature which is unconstrained to undo what it has learned on the constrained feature, so in the end it is not correct to think of the model as being monotone on the constrained features since the model has used correlation among the features to undo that monotonicity. There are almost always many correlations among features in complex datasets, so this is a real problem and makes applying monotonicity constraints to subsets of features problematic. And this is a problem with monotonicity constraints for all learning methods, not just EBMs. At least the effects are more visible with glassbox methods like ours.

Jun 24 '21 14:06 xiaohk

Ah ok I see, thanks @xiaohk. Also just out of curiosity, in Xgboost, lightgbm, and catboost they also have the monotone constraints, I assume that is also post processing? Or did they implement it during the fitting process?

Jun 24 '21 14:06 huanvo88

Ah ok I see, thanks @xiaohk. Also just out of curiosity, in Xgboost, lightgbm, and catboost they also have the monotone constraints, I assume that is also post processing? Or did they implement it during the fitting process?

They implement it as a monotonicity constraint during training. I believe monotonicity constraint during EBM training is on the development roadmap too.

Jun 24 '21 14:06 xiaohk

@xiaohk it is good to know that it is on the development roadmap. So I assume for now you will work on the monotone post processing and push it to the next release?

Jun 24 '21 15:06 huanvo88

@huanvo88 During the fitting process, if the direction of the identified split L > R is different from the constraint L < R, then a split is simply not made.

In regulatory space, we are often required to give plain language explanations for adverse decisions based on model scores. Business leaders need need to make sure that these explanations are sensible. For example, it would make sense to say that 'you were declined a loan offer because your total debt is too high', if debt is the most impactful feature in that model for that individual. But if debt had a U-shaped pattern, it could happen but would not make sense to say that 'you were declined a loan because your debt is both too high and too low'. Monotonic constraints eliminate that possibility with rare exception.

Jun 24 '21 15:06 paulsendavidjay

@xiaohk it is good to know that it is on the development roadmap. So I assume for now you will work on the monotone post processing and push it to the next release?

@huanvo88 My stuff is still work-in-progress, but I will keep you updated. If you are interested, I can also show you the pre-release version in the next few weeks. I'd really love to get some feedback from you :)

For now, I suggest just to use Isotonic regression to find the best monotonic shape of your learned shape function. The code is included in https://github.com/interpretml/interpret/issues/184#issuecomment-822844554.

Jun 24 '21 16:06 xiaohk

huanvo88 During the fitting process, if the direction of the identified split L > R is different from the constraint L < R, then a split is simply not made.

In regulatory space, we are often required to give plain language explanations for adverse decisions based on model scores. Business leaders need need to make sure that these explanations are sensible. For example, it would make sense to say that 'you were declined a loan offer because your total debt is too high', if debt is the most impactful feature in that model for that individual. But if debt had a U-shaped pattern, it could happen but would not make sense to say that 'you were declined a loan because your debt is both too high and too low'. Monotonic constraints eliminate that possibility with rare exception.

Hey @paulsendavidjay, thanks for the reply! Your example makes a lot of sense. Just out of curiosity, what is the rare exception where monotonic constraint doesn't help?

Jun 24 '21 16:06 xiaohk

Awesome, thank you.

On Thu, Jun 24, 2021, 12:29 PM Jay Wang @.***> wrote:

@xiaohk https://github.com/xiaohk it is good to know that it is on the development roadmap. So I assume for now you will work on the monotone post processing and push it to the next release?

My stuff is still work-in-progress, but I will keep you updated. If you are interested, I can also show you the pre-release version in the next few weeks. I'd really love to get some feedback from you :)

For now, I suggest just to use Isotonic regression to find the best monotonic shape of your learned shape function. The code is included in #184 (comment) https://github.com/interpretml/interpret/issues/184#issuecomment-822844554 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/interpretml/interpret/issues/184#issuecomment-867783977, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISDWP3PKUFZIIGXHMX3JULTUNMODANCNFSM4T6YJ3DA .

Jun 24 '21 16:06 huanvo88

you can check this out https://cs.stackexchange.com/questions/69220/random-forests-on-monotone-training-set-yields-a-monotone-classifier It's not what I was thinking of, which was a paper demonstrating a more clever example of training a gbm using monotonic constraints to specifically violate monotonicity in the final model. I'm unable to find the ref however.

Jun 24 '21 18:06 paulsendavidjay

An interesting paper on better monotonic splits in Trees: https://arxiv.org/pdf/2011.00986.pdf

Having quickly read the paper, my initial understanding is that it improves on the monotonicity constraints as follows:

Then, when we make any split (monotone or not) in a branch having a monotone node as a parent somewhere, after making the split, we need to check that the new outputs are not violating any constraint on other leaves of the tree. The general idea is that we should start from the node where a split was just made, go up the tree, and every time a monotone node is encountered, we should go down in the opposite branch and check that the constraints and the new outputs from the new split are compatible. If they are not, then the constraints need to be updated. Therefore making a split in a branch can very well update the constraints of other leaves in another branch.

My intuition tells me that this may only be used at the 2nd order interaction terms stage of training.

Additionally, if my intuition is correct, the very small decrease in training time would be even less important as the "opposite-branch-check", as italicised in the quote above, would only need checking on a small subset of cases.

Finally, I accept because it might be used on only a small subset of cases, it might not worth implementing for a potentially small accuracy improvement. Nonetheless, it would be interesting to test and find out!

Jul 02 '21 12:07 JoshuaC3

interpret interpret copied to clipboard

Monotone models

interpret
interpret copied to clipboard