lime
lime copied to clipboard
Problem implementing the RecurrentTabularExplainer in mix dataset ( numerical and categorical)
My inputdata looks like
X_train = (6697, 6, 23), and last seven columns are categorical variables.
categorical_features=[16, 17, 18, 19, 20, 21, 22]
explainer = lime.lime_tabular.RecurrentTabularExplainer(X_train, training_labels=y_train.argmax(axis=1), feature_names=data_columns,
discretize_continuous=True, categorical_features=categorical_features,
class_names=['A', "B", "C"], categorical_names= categorical_names, discretizer='decile')
exp = explainer.explain_instance(X_test[1], model.predict, num_features=20, labels=(1,0,2))
I got the following error massage.
IndexError Traceback (most recent call last)
<ipython-input-115-111b3ce428a0> in <module>
----> 1 exp = explainer.explain_instance(X_test[1], model.predict, num_features=20, labels=(1,0,2))
2 exp.show_in_notebook()
/usr/local/anaconda/lib/python3.6/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, classifier_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
700 num_samples=num_samples,
701 distance_metric=distance_metric,
--> 702 model_regressor=model_regressor)
/usr/local/anaconda/lib/python3.6/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
413 name = int(data_row[i])
414 if i in self.categorical_names:
--> 415 name = self.categorical_names[i][name]
416 feature_names[i] = '%s=%s' % (feature_names[i], name)
417 values[i] = 'True'
IndexError: index 3 is out of bounds for axis 0 with size 3
Highly appreciated for any help.
Can you print out what categorical_names is?
@marcotcr In explain_instance method for RecurrentTabularExplainer, in this loop:
name = int(data_row[i])
if i in self.categorical_names:
name = self.categorical_names[i][name]
The variable name will have an incorrect index because data_row is flattened (n_timesteps * n_features). Therefore, in my opinion, it should be replaced with:
name = int(data_row[i * self.n_timesteps])
if i in self.categorical_names:
name = self.categorical_names[i][name]
Perhaps (i'm not sure, how u can do it ...), it is necessary to redefine this class method to make it compatible with each tabular method.