alibi
alibi copied to clipboard
IndexError: index -1 is out of bounds for axis 0 with size 0
~/.local/lib/python3.6/site-packages/alibi/explainers/anchor_tabular.py in perturbation(self, anchor, num_samples)
278 )
279 nb_partial_anchors = np.array([len(n_records) for n_records in reversed(partial_anchor_rows)])
--> 280 coverage = nb_partial_anchors[-1] / self.n_records
281
282 # if there are enough train records containing the anchor, replace the original records and return...
IndexError: index -1 is out of bounds for axis 0 with size 0
Anchor Tabular
I got this error only for a specific instance that was going to be explained. It explained more than 10000 times correctly without error. The model is a Catboost classifier. The data row is
array([-4.75903911e-01, 5.79251837e-02, -1.72716600e+00, 2.25349359e+00,
-9.09370799e-02, 6.25132153e-01, 5.55555774e-01, 6.98985644e-17,
1.00000000e+00, 0.00000000e+00, 5.00000000e+00, 2.55000000e+02,
4.00000000e+00, 9.00000000e+00, 1.00000000e+00])
The model prediction for this data also worked properly. ( predict_fn(X_test[idx]) )
I can't find any issue of the data instance also. Please help me
Looks like the the variable nb_partial_anchors is empty, this could theoretically happen if there are no rows in the training set where the proposed anchor holds. This could be a bug on our side as we should catch that error if that is what is happening.
Can you run the code in a debugger and print out the variable values? Probably starting with allowed_rows in this method: https://github.com/SeldonIO/alibi/blob/ce4d695285236fe1b638446bedcabed5e8122d1f/alibi/explainers/anchor_tabular.py#L237-L306
Thank you for the quick reply. In the current situation, I am not using a debugger. I will upload them later. Thank you!
@jklaise Fixed in alibi>=0.5.6 ?
https://github.com/SeldonIO/alibi/blob/9af1d73d046a2239ce73c12c2e1f835fecc89780/alibi/explainers/anchor_tabular.py#L281
@enricorotundo I don't think so because you would get the same error regardless if the index was 0 or -1.
I think the issue may originate because the list is empty (meaning partial_anchor_rows is empty). Is this something you've also experienced?
Yes but I've workaround it by tweaking my dataset a bit so I get different anchors and it wouldn't fail.
@enricorotundo can you elaborate a little bit? Maybe we can get to the bottom of this and implement appropriate handling.
@jklaise I'm working with a customer dataset using AnchorTabular via Seldon so I'm not so sure how to provide a reproducible example. Anyhow, replacing AnchorTabular(clf.predict(x)) with AnchorTabular(clf.predict_proba(x)) magically works.
@enricorotundo hmm that's interesting, we do internal processing to convert models outputing probabilities to ones outputing labels using an ArgmaxTransformer, I wonder if there is some bug related to that: https://github.com/SeldonIO/alibi/blob/5f1be275f8d57fac83cee0e97bc5dbaf7a95e501/alibi/explainers/anchor_tabular.py#L961-L968
Needs reproducing to proceed.
I have now reproduced this and one way this happens is when the instance to be explained has a categorical value that is not represented in the training data.
I will post a minimal example and some possible solutions. The easy option would be checking whether all categorical values (and numerical bins?) of the instance are present in the training set and raising an Error / returning an "empty" explanation if so. Perhaps there is a workaround so we can still sample, but would need to think more about it.
MWE as below.
Explanation: In the MWE below we simulate a situation where the example to be explained has a categorical variable with a valid category but one that is not observed in the training data. This leads to an error during the sampling process as there are no rows in the training data that would satisfy a candidate anchor equal to the value of the categorical variable.
Potential solutions to follow.
MWE:
import numpy as np
from alibi.explainers import AnchorTabular
SEED = 0
# DATA
N = 100
N_CAT = 3
# 1 numerical and 1 categorical feature
np.random.seed(SEED)
num = np.random.rand(N)
cat = np.random.randint(3, size=N)
data = np.column_stack((num, cat))
# filter out any rows where cat == 2
train = data[data[:, 1] != 2]
# metadata
feature_names = ['numerical', 'categorical']
category_map = {1: ['category_0', 'category_1', 'category_2']}
# MODEL
predictor = lambda x: x[:, 1].astype(int) # dummy model - categorical feature determines class
# EXPLAINER
explainer = AnchorTabular(predictor=predictor,
feature_names=feature_names,
categorical_names=category_map,
seed=SEED)
explainer.fit(train)
# EXPLANATION
bad_instance = np.array([0.0, 2])
explanation = explainer.explain(bad_instance) # breaks here