alibi icon indicating copy to clipboard operation
alibi copied to clipboard

Path dependent TreeShap for LightGBM - `fit` and `_build_explanation`

Open RobertSamoilescu opened this issue 2 years ago • 2 comments

There seems to be an edge case which is not considered in our implementation.

For ligthgbm only the path-dependent method is supported when categorical features exists (i.e. in a pd.DataFrame we have columns with type category ). See link here.

The interventional method is not supported because shap doesn't know how to deal with categorical features. One has to OHE them to make it work.

For the path dependent approach, there are a few parameters that are not set, one of them being num_outputs which breaks the code in the first place. Those params are only set when self.tress is not None. See link here.

If we fix that, another problem arises in the _build_explanation method. _build_explanation calls the predict function to compute the raw predictions returned in the explanation (see link here). The predict function uses a TreeEnsamble wrapper define in shap which doesn't work because it uses a cext which also doesn't know how to handle categorical features. See link here.

RobertSamoilescu avatar Feb 10 '23 13:02 RobertSamoilescu

Seems like two different issues, i.e. since interventional method is not supported, fitting with a dataset should not work (can we raise an error if fit is called with arguments?) Additionally, in the path-dependent case we might also want to raise an error if the explain step is called with e.g. a pd.Dataframe containing categorical values?

The other issue is to do with when fit is called correctly without arguments so that the path-dependent method is used. Is it correct to say that the reason things don't quite work here is because we perform an additional predict call? It's not quite clear to me what the required fix is and how it interferes with this predict call?

jklaise avatar Feb 13 '23 13:02 jklaise

Might be a related issue with explaining catboost models.

There, the categorical features are transformed internally inside the model (docs). The input to explain() should then not be encoded and consequently _build_explanation fails in this case as well.

The shap library is able to output explanations by first converting the input data to catboost.Pool that handles the transformations (see here). Would appreciate if something similar could be added to your wrapper as well

anh-le-profinit avatar Jun 14 '23 12:06 anh-le-profinit