interpret icon indicating copy to clipboard operation
interpret copied to clipboard

Feature Suggestion / Request

Open arainboldt opened this issue 2 years ago • 2 comments

Really enjoy the work done on this library. Very interesting implementations and makes EDA a lot more fun!

I was wondering if there're any thoughts about implementing tooling for interpretable feature reduction.

The reason for asking is that I often work with very wide data, i.e. 20k + features, and feature reduction is an inescapable part of the modelling pipeline. I'd love to use this repo more but the opacity of feature reduction methods unfortunately renders its insights limited.

Some initial ideas that I'm working on implementing are a weighted feature map using Factor Analysis, but something that wasn't constrained to linear relations would be ideal, e.g. UMAP. Unfortunately, I sense that this isn't really possible.

If this inspires any of the contributors to incorporate something similar in the repo that'd be awesome, but now worries if not.

Any thoughts or direction are much appreciated!

Thanks again!

arainboldt avatar Aug 26 '22 12:08 arainboldt

Hi @arainboldt -- I'm glad to hear you're having fun using InterpretML. We’re certainly having fun developing it. :)

The easiest way to achieve this currently is to build an initial EBM model, sort the features by feature importance, drop the features below a threshold, and then retrain a new EBM using the restricted dataset. If you want to base this on a metric like log loss, then put it in a loop. I’m hesitant to make this an automatic or built-in functionality since it’s fairly easy to do externally, and also because there are many different ways that you could alternatively sort the features by importance. Given the interpretability of EBMs, a human might even be included in feature reduction process. Our next release will start to expose new ways of measuring feature importance, which should make this process even more interesting.

Since EBMs are an additive model, there is an alternative method of simply removing features from the model without retraining. Retraining would generally be expected to return a better model as the model could then utilize correlations from the excluded features, but in most cases it should probably be close. This option would be much faster, and could therefore allow finer grained decisions. We don’t currently offer such a postprocessing utility. At some point we plan to create a collection of model editing utilities, and it sounds like this capability should be on that list. Until then, we’d welcome such a contribution if someone were to write it.

Having said the above, I do think our package needs to make these options clearer. I've noticed that quite a few papers that compare InterpretML to other interpretability/explainability packages ding us for not providing automatic feature reduction, so your question is well placed. Perhaps we need to add more documentation on different post processing scenarios.

paulbkoch avatar Aug 27 '22 05:08 paulbkoch

Hey @paulbkoch thanks a lot for the prompt and thorough response. I'll explore these options and touch base here if I have any questions or updates.

Thanks!

arainboldt avatar Aug 27 '22 06:08 arainboldt