KeyError: 4
Dear @marcotcr Im using a two class data set with 6 features. everything properly works except this block of code:
i = np.random.randint(0, X_test.shape[0]) exp = explainer.explain_instance(X_test[i], rf.predict_proba, num_features=6, top_labels=1)
and the error is not understandable
KeyError Traceback (most recent call last)
/Volumes/Data/opt/anaconda3/envs/TensorFlow_env/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor) 338 # Preventative code: if sparse, convert to csr format if not in csr format already 339 data_row = data_row.tocsr() --> 340 data, inverse = self.__data_inverse(data_row, num_samples) 341 if sp.sparse.issparse(data): 342 # Note in sparse case we don't subtract mean since data would become dense
/Volumes/Data/opt/anaconda3/envs/TensorFlow_env/lib/python3.7/site-packages/lime/lime_tabular.py in __data_inverse(self, data_row, num_samples) 538 inverse = data.copy() 539 for column in categorical_features: --> 540 values = self.feature_values[column] 541 freqs = self.feature_frequencies[column] 542 inverse_column = self.random_state.choice(values, size=num_samples,
KeyError: 4
Would you please help me?
Can you share the lines where you instantiate the explainer? It looks as if X_test has a different shape than whatever you use to start the tabular explainer.
Other people have had the same issue (me included). It comes from a previous line in LimetabularExplainer.__data_inverse where categorical_features is overridden like so : categorical_features = range(num_cols) line 508. This happens even when you have specifically set categorical_features to an empty list at instanciation of the object.
This may happen if the training data you give to the LimeTabularExplainer has $n$ columns but the row you want to explain has $n+1$ columns because you forgot to remove the target column