alibi
alibi copied to clipboard
Path dependent TreeShap for LightGBM - `fit` and `_build_explanation`
There seems to be an edge case which is not considered in our implementation.
For ligthgbm
only the path-dependent method is supported when categorical features exists (i.e. in a pd.DataFrame
we have columns with type category
). See link here.
The interventional method is not supported because shap
doesn't know how to deal with categorical features. One has to OHE them to make it work.
For the path dependent approach, there are a few parameters that are not set, one of them being num_outputs
which breaks the code in the first place. Those params are only set when self.tress
is not None
. See link here.
If we fix that, another problem arises in the _build_explanation
method. _build_explanation
calls the predict
function to compute the raw predictions returned in the explanation (see link here). The predict
function uses a TreeEnsamble
wrapper define in shap
which doesn't work because it uses a cext
which also doesn't know how to handle categorical features. See link here.
Seems like two different issues, i.e. since interventional method is not supported, fitting with a dataset should not work (can we raise an error if fit
is called with arguments?) Additionally, in the path-dependent case we might also want to raise an error if the explain
step is called with e.g. a pd.Dataframe
containing categorical values?
The other issue is to do with when fit
is called correctly without arguments so that the path-dependent method is used. Is it correct to say that the reason things don't quite work here is because we perform an additional predict
call? It's not quite clear to me what the required fix is and how it interferes with this predict
call?
Might be a related issue with explaining catboost
models.
There, the categorical features are transformed internally inside the model (docs). The input to explain()
should then not be encoded and consequently _build_explanation
fails in this case as well.
The shap
library is able to output explanations by first converting the input data to catboost.Pool
that handles the transformations (see here).
Would appreciate if something similar could be added to your wrapper as well