textClassifier icon indicating copy to clipboard operation
textClassifier copied to clipboard

Code to Visualize Attention Weights

Open ni9elf opened this issue 7 years ago • 8 comments

Need some help in writing the code to obtain and visualize the attention weights like that in the HAN paper (heat map). To obtain the attention weights, I'm currently thinking of obtaining the hidden representations of the GRUs (h_it) and then manually using h_it to compute the attention weights using the equations from the call function of the attention layer.

layer_name = 'GRU' intermediate_layer_model = Model(input=model.input, output=model.get_layer(layer_name).output) intermediate_output = intermediate_layer_model.predict(input_variable) h_it = intermediate_output #use h_it from above to compute attention weights

If there is a more direct way (direct function call in Keras or some existing code available), it will be helpful.

ni9elf avatar May 18 '17 07:05 ni9elf

Any update on visualization part? @ni9elf @richliao I'm trying to get the weights, but it always end up to 1.0

> _dot = np.dot(out[0], att_w[0])
> _tanh = np.tanh(_dot)
> _exp = np.exp(_tanh)
> weights = _exp / np.sum(_exp)
> weights
array([1.], dtype=float32)

Here, out[0] and att_w[0] are my output layer and attention layer weights for the given sentence respectively. Any thoughts?

spate141 avatar May 03 '18 19:05 spate141

@spate141 @ni9elf @richliao Do you people have any update on the visualization part, can you guys help on this?

deepankar27 avatar May 28 '18 19:05 deepankar27

@deepankar27 the closest working solution I got is this: https://github.com/cbaziotis/neat-vision

spate141 avatar May 29 '18 18:05 spate141

@spate141 Thanks!! looks like a nice tool but how will I feed my model's attention values to it along with predicted model score for each label but how can I get the attention values? From where will I get this att_w[0] while predicting a label? It could be a stupid question....

deepankar27 avatar May 29 '18 18:05 deepankar27

@deepankar27 If your model has attention layer, you can easily get the output of that layer. With Keras, you can get it like this; Obtain the output of an intermediate layer with Keras

spate141 avatar May 30 '18 03:05 spate141

@spate141 Awesome!! Thanks a lot!!

deepankar27 avatar May 30 '18 05:05 deepankar27

@spate141 @deepankar27 @richliao: I am still having issues with capturing the attention weights. I believe I am getting the wts using att_w = model.get_layer('hierarchical_attn_2').get_weights() Now this is a list of lists of shape [3,200,200]. Should this wt matrix be shaped? Can you provide any assistance on how I translate this to the weights for my incoming text?

arunarn2 avatar Nov 07 '18 05:11 arunarn2

first make these changes in that Attention Class to visualize sentence level weights. i haven't taken time to account for that word level attention visualization though.

class AttLayer(Layer):
    def __init__(self,attention_dim,**kwargs):
        self.init = initializers.get('normal')
        self.supports_masking = True
        self.attention_dim = attention_dim
        super(AttLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        assert len(input_shape) == 3
        self.W = K.variable(self.init((input_shape[-1], self.attention_dim)),name='Attention_weight' )
        self.b = K.variable(self.init((self.attention_dim, )),name = 'Attention_Bias' )
        self.u = K.variable(self.init((self.attention_dim, 1)),name = 'Attention_power')
        self.trainable_weights = [self.W, self.b, self.u]
        super(AttLayer, self).build(input_shape)

    def compute_mask(self, inputs, mask=None):
        return None

    def call(self, x, mask=None):
        # size of x :[batch_size, sel_len, attention_dim]
        # size of u :[batch_size, attention_dim]
        # uit = tanh(xW+b)
        uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
        ait = K.dot(uit, self.u)
        ait = K.squeeze(ait, -1)

        ait = K.exp(ait)

        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            ait *= K.cast(mask, K.floatx())
        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        print(ait.shape,'is the shape')
        ait = K.expand_dims(ait)
        weighted_input = x * ait
#         print(weig)
        output = K.sum(weighted_input, axis=1)
        return output

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[-1])
      
    def _get_attention_weights(self, X):

        uit = K.tanh(K.bias_add(K.dot(X, self.W), self.b))
        ait = K.dot(uit, self.u)
        ait = K.squeeze(ait, -1)
        ait = K.exp(ait)
        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        ait = K.expand_dims(ait)
        weighted_input = X * ait
        return ait

i am assuming the code below as continuation after richloao's code for model fitting.

later after training the model, you can infer the attention weights for your testing data as :

from keras.layers import Lambda
att_layer = model.get_layer('sentence_attention')
prev_tensor = att_layer.input
dummy_layer = Lambda(
          lambda x: att_layer._get_attention_weights(x)
      )(prev_tensor)

from keras.models import Model
attention_weights = Model(model.input, dummy_layer).predict(x_val)

## shape of above matrix  is : (size of validation set, MAX_SENTS, 1 
## that means, for each sentence, we get the attention ranking for dimension 1, obviously :D)

Note, i used x_val, but, try dividing data into train, test and val set. unlike richliao's train and val set only. Then, visualize the sentence weights.

mail me at : [email protected] , i will share complete extension to this code to save the model later on, visualize weights and stuffs :D

robin-fusemachines avatar Jul 19 '19 06:07 robin-fusemachines