show_attend_and_tell.tensorflow
show_attend_and_tell.tensorflow copied to clipboard
Bugs report and problems
@jazzsaxmafia Have you trained out a good model from your code ?
There might be a bug:
In the function of build_model and build_generator in model_tensorflow.py
h = o * tf.nn.tanh(new_c)
should be replaced by
h = o * tf.nn.tanh(c)
Another problem is about context_encode. Is that same with the original code ? Moreover, I think data should be shuffled for each epoch. The code seems only shuffle the data once.
Yes, you are totally right. I should've corrected my codes but I was too busy nowadays. I will work on it as soon as possible.
Thank you for pointing the bugs out. -Taeksoo
@Liu0329
@jazzsaxmafia
Agree with Liu0329.
Is there another bug in the next line:
logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b should be replaced by
logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b
+ tf.matmul(weighted_context, self.decode_lstm_image_W)
+tf.matmul(word_emb, self.decode_lstm_word_W)
As (7) in the original papers, logits should be computed by W^T concat(weighted_context, word_emb, h) ?
@RutgersHan Have you trained out a model that can be used ?
@Liu0329 I did not use attention model for image captioning. I used the attention model for another task, For me, the result is not very good. It is only a little better comparing the model without attention. I am not sure whether it is due to my implementation or something else. So that's why I am asking in the previous question, do you think there is bug for the logits computation in this implementation?
@RutgersHan Yes, according to the paper the logits have 3 parts as you showed. Now I will try the training again. This repo seems unfinished, lots of parts simplified compared with the original code. We can help to improve it.
@Liu0329 @jazzsaxmafia @RutgersHan
Hello, I am currently reading the code and thanks for the post! For the 'alpha' calculation, aren't we suppose to use the context ('context_encode') and the previous hidden state? but in the implementation, it seems that it's summing up all the previous hidden state?
context_encode = context_encode +
tf.expand_dims(tf.matmul(h, self.hidden_att_W), 1) +
self.pre_att_b
Thanks!
@Liu0329 @jazzsaxmafia @RutgersHan @1230pitchanqw Hello,I'm a undergraduate from Communication University Of China ,I'm currently working on this area for my graduation paper . Could you please tell me if you have worked out a satisfactory result based on this repo ? if it is ,how many epoch did you use to train out a good model? and what else did you change the original code? I would appreciate it if you can give me a reply ,thank you very much.