eli5 icon indicating copy to clipboard operation
eli5 copied to clipboard

Explaining LSTM keras with Eli5 library

Open ogreyesp opened this issue 6 years ago • 14 comments

Hi, I'm trying to use Eli5 for explaining an LSTM keras model for time series prediction. The keras model receives as input an array with shape (nsamples, timesteps, nfeatures).

This is my code:

from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
import eli5
from eli5.sklearn import PermutationImportance

def baseline_model():   
    model = Sequential()
    model.add(LSTM(32, input_shape=(X_train.shape[1], X_train.shape[2])))
    model.add(Dropout(0.2))
    model.add(Dense(1))
    model.compile(loss='logcosh', optimizer='adam')
    return model

my_model = KerasRegressor(build_fn= baseline_model, nb_epoch= 30, batch_size= 32, verbose= False)
history = my_model.fit(X_train, y_train)

So far, everything is ok. The problem is when I execute the following line that launchs an error:

Note: X_train has a shape equal to (nsamples, timesteps, nfeatures) and y_train has a shape (nsamples)

perm = PermutationImportance(my_model, random_state=1).fit(X_train, y_train)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-75-c9cc23da0083> in <module>()
      2 d2_train_dataset = X_train.reshape((nsamples, timesteps * features))
      3 
----> 4 perm = PermutationImportance(my_model, random_state=1).fit(X_train, y_train)
      5 #eli5.show_weights(perm, feature_names = X.columns.tolist())

~/anaconda3/lib/python3.6/site-packages/eli5/sklearn/permutation_importance.py in fit(self, X, y, groups, **fit_params)
    183             self.estimator_.fit(X, y, **fit_params)
    184 
--> 185         X = check_array(X)
    186 
    187         if self.cv not in (None, "prefit"):

~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    568         if not allow_nd and array.ndim >= 3:
    569             raise ValueError("Found array with dim %d. %s expected <= 2."
--> 570                              % (array.ndim, estimator_name))
    571         if force_all_finite:
    572             _assert_all_finite(array,

ValueError: Found array with dim 3. Estimator expected <= 2.

What can I do to fix this error? How can I use eli5 with my LSTM Keras Model?

Bets regards

ogreyesp avatar Jan 14 '19 12:01 ogreyesp

I get the same error. Any luck with this?

jsga avatar Feb 05 '19 10:02 jsga

@ogreyesp Hey! I think the problem here is that Sci-kit learn expects 2d num arrays for the training dataset for a fit function and the dataset which you are trying to pass here is in the shape of a 3d array. Please try reshaping it to a 2d array.

kaustumbh7 avatar Feb 26 '19 16:02 kaustumbh7

I'm running into the same issue. When I reshape training dataset to 2D I will then get an error because my model is expecting a 3D input.

`~\Anaconda3\envs\keras-gpu\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix) 125 ': expected ' + names[i] + ' to have ' + 126 str(len(shape)) + ' dimensions, but got array ' --> 127 'with shape ' + str(data_shape)) 128 if not check_batch_axis: 129 data_shape = data_shape[1:]

ValueError: Error when checking input: expected lstm_5_input to have 3 dimensions, but got array with shape (12, 324)`

blaklodge avatar May 02 '19 21:05 blaklodge

I get the same error. Any luck with this?

simplezhang57 avatar Feb 03 '20 01:02 simplezhang57

@ogreyesp Hey! I think the problem here is that Sci-kit learn expects 2d num arrays for the training dataset for a fit function and the dataset which you are trying to pass here is in the shape of a 3d array. Please try reshaping it to a 2d array.

If I am not mistaken, LSTM layers require 3d array, so I don't think eli5 can explain LSTM

jarrettyeo avatar May 17 '20 11:05 jarrettyeo

I discovered another library SHAP which allows you to analyze LSTM and other 3d-array models, and I finally managed to get the feature importance for my LSTM model. For more details, check out my answer here.

jarrettyeo avatar May 28 '20 09:05 jarrettyeo

@jarrettyeo, I just have a question X_test in both DE.shape_values and summary_plot should have the same shape or they have to have different shape? import shap

tf.compat.v1.disable_eager_execution()

background = X_train[np.random.choice(X_train.shape[0], 100, replace=False)]

DE = shap.DeepExplainer(model, background) # X_train is 3d numpy.ndarray shap_values = DE.shap_values(X_test, check_additivity=False) # X_validate is 3d numpy.ndarray

shap.initjs() shap.summary_plot( shap_values[0], X_test, feature_names=list_column, max_display=12, plot_type='bar')

rebeen avatar Jul 15 '20 04:07 rebeen

@jarrettyeo, I just have a question X_test in both DE.shape_values and summary_plot should have the same shape or they have to have different shape?

@rebeen I can't remember what shape they need to be in, but here is some of my code from my project which you can adapt:

def convert_3d_to_2d(array):
    if type(array) != np.ndarray:
        raise TypeError("type(array)={} != numpy.ndarray".format(type(array)))
    return array.reshape(array.shape[0], array.shape[1] * array.shape[2])

DE = shap.DeepExplainer(model, X_train) # X_train is 3d array
shap_values = DE.shap_values(X_validate, check_additivity=False)  # X_validate is 3d array
shap.summary_plot(
        convert_3d_to_2d(shap_values[0]), # <- This is probably what you need
        X_validate,
        feature_names=list_columns
)

You didn't mention what problem you were facing but I am guessing it's whether shap_values[0] is correct for summary_plot(). If that is the case, you just need to convert shap_values[0] from 3d to 2d using a custom function convert_3d_to_2d(shap_values[0]) because the plot does not know how to plot it in 3d. Let me know if that works.

jarrettyeo avatar Jul 15 '20 04:07 jarrettyeo

Thank you actually I faced this problem

`TypeError Traceback (most recent call last) in () 14 feature_names=list_column, 15 max_display=12, ---> 16 plot_type='bar')

/usr/local/lib/python3.6/dist-packages/shap/plots/summary.py in summary_plot(shap_values, features, feature_names, max_display, plot_type, color, axis_color, title, alpha, show, sort, color_bar, plot_size, layered_violin_max_num_bins, class_names, class_inds, color_bar_label, auto_size_plot) 148 summary_plot( 149 proj_shap_values, features[:, sort_inds] if features is not None else None, --> 150 feature_names=feature_names[sort_inds], 151 sort=False, show=False, color_bar=False, 152 plot_size=None,

TypeError: only integer scalar arrays can be converted to a scalar index`

rebeen avatar Jul 15 '20 09:07 rebeen

@rebeen Can you open an issue on Stackoverflow and link it here? Then we can avoid hijacking this eli5 thread

jarrettyeo avatar Jul 15 '20 12:07 jarrettyeo

@jarrettyeo Thank you very much I saw your answer on Stackoverflow, also I solved the problem so now I want to run the code properly and let you know about the results Rebeen

rebeen avatar Jul 15 '20 14:07 rebeen

@rebeen Please do, happy to help

jarrettyeo avatar Jul 15 '20 15:07 jarrettyeo

@jarrettyeo thank you very much, actually, I could not show the screenshot of the results so I sent to your LinkedIn could you please let me know your opinion

rebeen avatar Jul 15 '20 17:07 rebeen

I'm running into the same issue. When I reshape training dataset to 2D I will then get an error because my model is expecting a 3D input.

`~\Anaconda3\envs\keras-gpu\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix) 125 ': expected ' + names[i] + ' to have ' + 126 str(len(shape)) + ' dimensions, but got array ' --> 127 'with shape ' + str(data_shape)) 128 if not check_batch_axis: 129 data_shape = data_shape[1:]

ValueError: Error when checking input: expected lstm_5_input to have 3 dimensions, but got array with shape (12, 324)`

Getting the same, any updates on this?

hipoglucido avatar Aug 16 '23 13:08 hipoglucido