DeepExplain
DeepExplain copied to clipboard
Visual Interpretation on Attributions -- text classifications with LRP
Hi!
Thanks for sharing your codes and explanations. Those were extremely helpful.
I am a beginner with keras and sentiment analysis, and currently training a CNN model to classify some textual data. To embark on the next stage, it would be great to get an idea how each word in a sample contributes to the final classification. And this is very similar to the "LPR application on IMDB dataset" in your paper and the demo on http://www.heatmapping.org/
At this point, I have got the "explain" method working, but I am stuck with the interpretations and visualizations. Ideally, we could map the attributes output to the original texts and create a heatmap on the vocabulary. My attributes output is of size(6, 150, 50), where 6 is the sample size, 150 is the sequence length, and 50 is the embedding dimension.
Any suggestion would be appreciated!
Here is an outline of my code:
NUM_CATEGORY = 4
MAX_SEQUENCE_LENGTH = 150
MAX_NB_WORDS = 20000 # number of words in vocabulary
EMBEDDING_DIM = 50
VALIDATION_SPLIT = 0.333
# feed in preprocessed data
embedding_layer = Embedding(len(token_index) + 1,
EMBEDDING_DIM, # 50
weights=[embedding_matrix], # pretrained embedding matrix
input_length=MAX_SEQUENCE_LENGTH,
trainable=True, name='embedding')
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH, ), dtype='int32', name='input_x')
embedded_sequences = embedding_layer(sequence_input)
l_conv = Conv1D(nb_filter=100,filter_length=3, kernel_regularizer=l2(0.001))(embedded_sequences)
l_actv = Activation('relu')(l_conv)
l_dropout = Dropout(0.5)(l_actv)
l_pool = MaxPooling1D(5)(l_dropout)
l_flat = Flatten()(l_pool)
l_dense = Dense(50, W_regularizer=l2(0.05))(l_flat)
l_actv1 = Activation('relu')(l_dense)
l_dropout2 = Dropout(0.2)(l_actv1)
l_dense2 = Dense(4, W_regularizer=l2(0.005), name='dense2')(l_dropout2)
pred = Activation('softmax')(l_dense2)
model = Model(inputs= sequence_input, outputs=pred)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
callbacks = [
EarlyStopping(monitor='val_loss', patience=5, verbose=0),
ModelCheckpoint('model-{epoch:03d}.h5', verbose=1, monitor='val_loss',save_best_only=True, mode='auto')
]
history = model.fit(x_train, y_train, validation_data=(x_test, y_test),
batch_size=32, epochs = 100, callbacks=callbacks)
score, acc = model.evaluate(x_test, y_test, batch_size=32)
with DeepExplain(session=K.get_session()) as de:
model = load_model('model-036.h5')
# reference to tensors
input_tensor = model.get_layer("input_x").input
embedding = model.get_layer("embedding").output
pre_softmax = model.get_layer("dense2").output
# sample
x_interpret = x_test[9:15]
# perform lookup
get_embedding_output = K.function([input_tensor],[embedding])
embedding_out = get_embedding_output([x_interpret])[0]
# target the output of the last dense layer (pre-softmax)
fModel = Model(inputs=input_tensor, outputs = pre_softmax)
target_tensor = fModel(input_tensor)
# to target a specific neuron(class), we apply a binary map
ys = [1, 0, 0, 0]
# np-array of size (6, 150, 50)
attributions = de.explain('elrp', pre_softmax*ys, embedding, embedding_out)
Hi there!
What is normally done for NLP, is to sum up attributions over the embedding dimensions. In your case, you would do np.sum(attributions, -1) and end up with an array of size (6, 150). Now you have a score for each word, that you can visualize as you prefer.
Hi Marco, thanks for helping out!
I have applied the sum-up to the output attributes matrix. The problem the number of nonzeros for each sample does not align with the number of words contained in it. Does it make sense?
For example, one pre-embedding word list is cleaned into a 27-word list, with paddings to the right. But its attributes weight with 98 nonzero values does not correspond to the word list. I was expecting 27 nonzero values in its summed up attributes weight with paddings to the right. Is there anything wrong with my understanding/implementation?
Many thanks!
Sample output for the attributes weight (150, 1):
[0.0047955 -0.0515381 -0.0528361 -0.195269 -0.0927481 -0.0920452 -0.0133876 0.00504693 -0.0455068 0.0607065 0.0758778 -0.0437094 0.105729 0.160636 -0.0028076 -0.117737 0.00311046 0.194461 0.135671 0.110874 0.0623835 0.117782 0.0749264 0.0127667 0.1243 0.0727738 -0.0178832 -0.000369398 0.000776381 0 0.000122282 0.00010872 -7.24178e-06 0 0 0.000255847 0.000227472 -1.51518e-05 0 0 0.000161644 0.000143716 -9.57289e-06 0 0 0.000129164 0.000114838 -7.64935e-06 0 0 9.28584e-05 8.25596e-05 -5.49928e-06 0 0 3.7087e-05 3.29738e-05 -2.19637e-06 0 0 0.000121541 0.000108061 -7.19789e-06 0 0 -4.44104e-05 -3.94849e-05 2.63008e-06 0 0 -1.30395e-05 -1.15933e-05 7.72226e-07 0 0 2.89512e-05 2.57402e-05 -1.71455e-06 0 0 -4.49442e-05 -3.99595e-05 2.66169e-06 0 0 -1.60106e-05 -1.42349e-05 9.4818e-07 0 0 -1.14543e-06 -1.01839e-06 6.78348e-08 0 0 -8.79873e-07 -7.82288e-07 5.21079e-08 0 0 -2.24294e-05 -1.99418e-05 1.32831e-06 0 0 -4.43908e-05 -3.94675e-05 2.62892e-06 0 0 -3.49119e-05 -3.10399e-05 2.06756e-06 0 0 6.78908e-06 6.03612e-06 -4.02063e-07 0 0 6.4125e-06 5.7013e-06 -3.79762e-07 0 0 6.68977e-06 5.94782e-06 -3.96183e-07 0 0 5.62173e-06 4.99824e-06 -3.32931e-07 0 0 5.20878e-06 4.63108e-06 -3.08475e-07 0 0 6.9628e-06 6.19057e-06 -4.12352e-07 0 0 0 0 0 0 0]
What is normally done for NLP, is to sum up attributions over the embedding dimensions.
@marcoancona Could you please provide relevant papers? Thx!