alibi icon indicating copy to clipboard operation
alibi copied to clipboard

IndexError: index -1 is out of bounds for axis 0 with size 0

Open charlie9526 opened this issue 5 years ago • 11 comments

~/.local/lib/python3.6/site-packages/alibi/explainers/anchor_tabular.py in perturbation(self, anchor, num_samples)
    278         )
    279         nb_partial_anchors = np.array([len(n_records) for n_records in reversed(partial_anchor_rows)])
--> 280         coverage = nb_partial_anchors[-1] / self.n_records
    281 
    282         # if there are enough train records containing the anchor, replace the original records and return...

IndexError: index -1 is out of bounds for axis 0 with size 0

Anchor Tabular

I got this error only for a specific instance that was going to be explained. It explained more than 10000 times correctly without error. The model is a Catboost classifier. The data row is

array([-4.75903911e-01,  5.79251837e-02, -1.72716600e+00,  2.25349359e+00,
       -9.09370799e-02,  6.25132153e-01,  5.55555774e-01,  6.98985644e-17,
        1.00000000e+00,  0.00000000e+00,  5.00000000e+00,  2.55000000e+02,
        4.00000000e+00,  9.00000000e+00,  1.00000000e+00])

The model prediction for this data also worked properly. ( predict_fn(X_test[idx]) )

I can't find any issue of the data instance also. Please help me

charlie9526 avatar Nov 03 '20 12:11 charlie9526

Looks like the the variable nb_partial_anchors is empty, this could theoretically happen if there are no rows in the training set where the proposed anchor holds. This could be a bug on our side as we should catch that error if that is what is happening.

Can you run the code in a debugger and print out the variable values? Probably starting with allowed_rows in this method: https://github.com/SeldonIO/alibi/blob/ce4d695285236fe1b638446bedcabed5e8122d1f/alibi/explainers/anchor_tabular.py#L237-L306

jklaise avatar Nov 03 '20 14:11 jklaise

Thank you for the quick reply. In the current situation, I am not using a debugger. I will upload them later. Thank you!

charlie9526 avatar Nov 03 '20 14:11 charlie9526

@jklaise Fixed in alibi>=0.5.6 ?

https://github.com/SeldonIO/alibi/blob/9af1d73d046a2239ce73c12c2e1f835fecc89780/alibi/explainers/anchor_tabular.py#L281

enricorotundo avatar May 20 '21 12:05 enricorotundo

@enricorotundo I don't think so because you would get the same error regardless if the index was 0 or -1.

I think the issue may originate because the list is empty (meaning partial_anchor_rows is empty). Is this something you've also experienced?

jklaise avatar May 20 '21 12:05 jklaise

Yes but I've workaround it by tweaking my dataset a bit so I get different anchors and it wouldn't fail.

enricorotundo avatar May 20 '21 13:05 enricorotundo

@enricorotundo can you elaborate a little bit? Maybe we can get to the bottom of this and implement appropriate handling.

jklaise avatar May 20 '21 13:05 jklaise

@jklaise I'm working with a customer dataset using AnchorTabular via Seldon so I'm not so sure how to provide a reproducible example. Anyhow, replacing AnchorTabular(clf.predict(x)) with AnchorTabular(clf.predict_proba(x)) magically works.

enricorotundo avatar May 25 '21 18:05 enricorotundo

@enricorotundo hmm that's interesting, we do internal processing to convert models outputing probabilities to ones outputing labels using an ArgmaxTransformer, I wonder if there is some bug related to that: https://github.com/SeldonIO/alibi/blob/5f1be275f8d57fac83cee0e97bc5dbaf7a95e501/alibi/explainers/anchor_tabular.py#L961-L968

jklaise avatar May 26 '21 08:05 jklaise

Needs reproducing to proceed.

jklaise avatar Jul 14 '21 16:07 jklaise

I have now reproduced this and one way this happens is when the instance to be explained has a categorical value that is not represented in the training data.

I will post a minimal example and some possible solutions. The easy option would be checking whether all categorical values (and numerical bins?) of the instance are present in the training set and raising an Error / returning an "empty" explanation if so. Perhaps there is a workaround so we can still sample, but would need to think more about it.

jklaise avatar Aug 01 '22 12:08 jklaise

MWE as below.

Explanation: In the MWE below we simulate a situation where the example to be explained has a categorical variable with a valid category but one that is not observed in the training data. This leads to an error during the sampling process as there are no rows in the training data that would satisfy a candidate anchor equal to the value of the categorical variable.

Potential solutions to follow.

MWE:

import numpy as np

from alibi.explainers import AnchorTabular

SEED = 0

# DATA
N = 100
N_CAT = 3

# 1 numerical and 1 categorical feature
np.random.seed(SEED)
num = np.random.rand(N)
cat = np.random.randint(3, size=N)

data = np.column_stack((num, cat))

# filter out any rows where cat == 2
train = data[data[:, 1] != 2]

# metadata
feature_names = ['numerical', 'categorical']
category_map = {1: ['category_0', 'category_1', 'category_2']}

# MODEL
predictor = lambda x: x[:, 1].astype(int)  # dummy model - categorical feature determines class

# EXPLAINER
explainer = AnchorTabular(predictor=predictor,
                          feature_names=feature_names,
                          categorical_names=category_map,
                          seed=SEED)
explainer.fit(train)

# EXPLANATION
bad_instance = np.array([0.0, 2])
explanation = explainer.explain(bad_instance)  # breaks here

jklaise avatar Aug 01 '22 16:08 jklaise