show_attend_and_tell.tensorflow
show_attend_and_tell.tensorflow copied to clipboard
a lots of weight matrix
in the function of build_model, I saw a lots of weight mtarix, for example image_att_W, hidden_att_W, att_W, image_encode_W
and so on, I don't know why.
in my opion, the lstm has 2 weight matrix, w
for input and u
for hidden status, so I write code in the for loop like this:
context_encode = input * w + b
context_encode += h * u
context_encode = tanh(context_encode)
but what's that about alpha = tf.matmul(context_encode_flat, self.att_W) + self.att_b
, and in line 110 again softmax
function, and in line 114 another weight matrix image_encode_W
Hello, The LSTM used in this project gets not only input (words) and last state, but also aggregated image features. That is why it has extra weights. Since "Show attend and tell" is about attending on a specific part of a image, some other weights for attention mechanism are also used. Those that have "att" in its variable names are all about attention.
Alpha values are the attention values. If the model decides to attend on upper-left part of an image, the alpha value corresponding that region will be big.
Hope this could answer your questions. If you have other questions, please let me know. Thank you. -Taeksoo
2016-05-23 15:31 GMT+09:00 qingzew [email protected]:
in the function of build_model, I saw a lots of weight mtarix, for example image_att_W, hidden_att_W, att_W, image_encode_W and so on, I don't know why. in my opion, the lstm has 2 weight matrix, w for input and u for hidden status, so I write code in the for loop like this:
context_encode = input * w + b context_encode += h * u context_encode = tanh(context_encode)
but what's that about alpha = tf.matmul(context_encode_flat, self.att_W)
- self.att_b, and in line 110 again softmax function, and in line 114 another weight matrix image_encode_W
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow/issues/5
Hi thank you for your answer, quickly something I know, but not all of it. the lstm has a input of image, so this line
context_encode = tf.matmul(context_flat, self.image_att_W)
gives it a weight, right?, but I think it's not necessary?
between line 100 and 110, it computes the attention values, but why call activation function twice, why does it call reshape
and softmax
, what's the problem of tanh
, it can be used as the attention values?
I think I need to read more about 'show and tell'
@qingzew the computation of context_encode also confused me. Are your clear now ?
@Liu0329 you can see this blog, https://blog.heuritech.com/2016/01/20/attention-mechanism/, it is clear, but a little different from this project
@qingzew great, thanks !
@qingzew Have your trained out a model that can be used ?
@Liu0329 no, image-caption is not my focus, I'm doing something about nlp with a attention model, but I have some problems to implement the model with tensorflow