Sequence-to-Sequence-and-Attention-from-scratch-using-Tensorflow soft attention function

soft attention function

Open Georgi-SB opened this issue 6 years ago • 1 comments

Thanks for the clean and nice to read code! I think there might be a bug in the soft attention module:

eij=tf.tanh(unrol_states)
#Softmax across the unrolling dimension
 softmax=tf.nn.softmax(eij,dim=1)
 context=tf.reduce_sum(tf.multiply(softmax,unrol_states),axis=1) #Sum across axis time

According to the cited attention paper the eij in your code correspond to the eij on page 3, and the softmax variable in the code should be the \alpha_{ij} from equation (6) on page 3 from the paper. So far so good, but in the paper the authors use the \alpha_{ij} to create the context vector by averaging the encoder outputs h1,...,hN while you first transform these encoder outputs by

for h in range(num_unrollings):
            hidden_states[h]=tf.multiply(hidden_states[h],attn_weights)+prev_hidden_state_times_w

and then apply the context averaging. Is there some special reason to do so?

Thanks a lot and cheers, g

May 06 '18 22:05 Georgi-SB

The implementation is based on the following blog https://blog.heuritech.com/2016/01/20/attention-mechanism/ . The soft attention returns a weighted arithmetic mean of the y_i, and the weights are chosen according the relevance of each y_i given the context c. These are the weights that are multiplied in the for loop.

Jun 03 '18 15:06 subho406

Sequence-to-Sequence-and-Attention-from-scratch-using-Tensorflow Sequence-to-Sequence-and-Attention-from-scratch-using-Tensorflow copied to clipboard

soft attention function

Sequence-to-Sequence-and-Attention-from-scratch-using-Tensorflow
Sequence-to-Sequence-and-Attention-from-scratch-using-Tensorflow copied to clipboard