facet
facet copied to clipboard
How to calculate SRI for nonlinear models?
@mtsokol
https://github.com/BCG-X-Official/facet/issues/374 related.
Thank you. I have modified your code and considered non-linear models such as KernelRidge.
However, KernelRidge is naturally not compatible with TreeExplainerFactory, so I considered using KernelExplainerFactory or ExactExplainerFactory. However, since ExactExplainerFactory is not usable depending on the size of the dataset, I adopted KernelExplainerFactory(shap_interaction=True).
In this case, a RuntimeError occurs. RuntimeError: SHAP interaction values have not been calculated. Create an inspector with parameter 'shap_interaction=True' to enable calculations involving SHAP interaction values.
Checking your implementation, it seems that KernelExplainerFactory does not compute shap_interaction. https://github.com/BCG-X-Official/facet/blob/66bea1574e7a05e8db13cc25b5f071a260d0f66b/src/facet/explanation/_explanation.py#L377 https://github.com/BCG-X-Official/facet/blob/66bea1574e7a05e8db13cc25b5f071a260d0f66b/src/facet/inspection/_learner_inspector.py#L139
I have two questions. 1. For non-linear models, is it necessary to use ExactExplainerFactory and perform inspector.fit()? What should I do if the data size is large? 2. The specification that KernelExplainerFactory internally converts shap_interaction=True to False is confusing. Would it be better to throw an error if shap_interaction=True is specified, or change it so that the shap_interaction argument cannot be specified at all?
import pandas as pd
from sklearn.model_selection import RepeatedKFold, GridSearchCV
# some helpful imports from sklearndf
from sklearndf.pipeline import RegressorPipelineDF
from sklearndf.regression import RandomForestRegressorDF
# relevant FACET imports
from facet.data import Sample
from facet.selection import LearnerSelector, ParameterSpace
from sklearn.datasets import load_diabetes
X,y = load_diabetes(return_X_y=True)
data = load_diabetes()
X = pd.DataFrame(X)
X.columns = data["feature_names"]
y = pd.DataFrame(y)
y.columns = ["target"]
diabetes_df = pd.concat([X,y], axis=1)
# create FACET sample object
diabetes_sample = Sample(observations=diabetes_df, target_name="target")
# create a (trivial) pipeline for a random forest regressor
from sklearn.kernel_ridge import KernelRidge
model = KernelRidge()
model.fit(X,y)
# fit the model inspector
from facet.inspection import NativeLearnerInspector
inspector = NativeLearnerInspector(
model=model,
explainer_factory=KernelExplainerFactory(),
n_jobs=-3,
shap_interaction=True
)
inspector.fit(diabetes_sample)
# visualise synergy as a matrix
from pytools.viz.matrix import MatrixDrawer
synergy_matrix = inspector.feature_synergy_matrix()
# visualise redundancy as a matrix
redundancy_matrix = inspector.feature_redundancy_matrix()
# visualise redundancy using a dendrogram
import matplotlib
from pytools.viz.dendrogram import DendrogramDrawer
redundancy = inspector.feature_redundancy_linkage()
Sorry @jckkvs this slipped through
Re (1) totally agree an exception would be better - we will make that change
Re (2) FACET relies on the shap
package for all SHAP calculations. I agree it would be great to see support for interaction values for a broader set of models - probably best to raise that to the shap
maintainers. Alternatively we may consider adding our own interaction explainer to a future version of FACET (in our own work, we find that ensemble models work great so we're fine using the TreeExplainer)
That's right, for SHAP interaction values I believe you are mostly limited to tree based models.