show_attend_and_tell.tensorflow icon indicating copy to clipboard operation
show_attend_and_tell.tensorflow copied to clipboard

a lots of weight matrix

Open qingzew opened this issue 8 years ago • 7 comments

in the function of build_model, I saw a lots of weight mtarix, for example image_att_W, hidden_att_W, att_W, image_encode_W and so on, I don't know why. in my opion, the lstm has 2 weight matrix, w for input and u for hidden status, so I write code in the for loop like this:

context_encode = input * w + b
context_encode += h * u
context_encode = tanh(context_encode)

but what's that about alpha = tf.matmul(context_encode_flat, self.att_W) + self.att_b, and in line 110 again softmax function, and in line 114 another weight matrix image_encode_W

qingzew avatar May 23 '16 06:05 qingzew

Hello, The LSTM used in this project gets not only input (words) and last state, but also aggregated image features. That is why it has extra weights. Since "Show attend and tell" is about attending on a specific part of a image, some other weights for attention mechanism are also used. Those that have "att" in its variable names are all about attention.

Alpha values are the attention values. If the model decides to attend on upper-left part of an image, the alpha value corresponding that region will be big.

Hope this could answer your questions. If you have other questions, please let me know. Thank you. -Taeksoo

2016-05-23 15:31 GMT+09:00 qingzew [email protected]:

in the function of build_model, I saw a lots of weight mtarix, for example image_att_W, hidden_att_W, att_W, image_encode_W and so on, I don't know why. in my opion, the lstm has 2 weight matrix, w for input and u for hidden status, so I write code in the for loop like this:

context_encode = input * w + b context_encode += h * u context_encode = tanh(context_encode)

but what's that about alpha = tf.matmul(context_encode_flat, self.att_W)

  • self.att_b, and in line 110 again softmax function, and in line 114 another weight matrix image_encode_W

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow/issues/5

jazzsaxmafia avatar May 23 '16 06:05 jazzsaxmafia

Hi thank you for your answer, quickly something I know, but not all of it. the lstm has a input of image, so this line

context_encode = tf.matmul(context_flat, self.image_att_W)

gives it a weight, right?, but I think it's not necessary?

between line 100 and 110, it computes the attention values, but why call activation function twice, why does it call reshape and softmax, what's the problem of tanh, it can be used as the attention values?

I think I need to read more about 'show and tell'

qingzew avatar May 23 '16 07:05 qingzew

@qingzew the computation of context_encode also confused me. Are your clear now ?

Liu0329 avatar Jun 29 '16 03:06 Liu0329

@Liu0329 you can see this blog, https://blog.heuritech.com/2016/01/20/attention-mechanism/, it is clear, but a little different from this project

qingzew avatar Jun 29 '16 07:06 qingzew

@qingzew great, thanks !

Liu0329 avatar Jun 29 '16 13:06 Liu0329

@qingzew Have your trained out a model that can be used ?

Liu0329 avatar Jun 29 '16 14:06 Liu0329

@Liu0329 no, image-caption is not my focus, I'm doing something about nlp with a attention model, but I have some problems to implement the model with tensorflow

qingzew avatar Jun 30 '16 01:06 qingzew