show_attend_and_tell.tensorflow icon indicating copy to clipboard operation
show_attend_and_tell.tensorflow copied to clipboard

Bugs report and problems

Open Liu0329 opened this issue 8 years ago • 7 comments

@jazzsaxmafia Have you trained out a good model from your code ?

There might be a bug: In the function of build_model and build_generator in model_tensorflow.py h = o * tf.nn.tanh(new_c) should be replaced by h = o * tf.nn.tanh(c)

Another problem is about context_encode. Is that same with the original code ? Moreover, I think data should be shuffled for each epoch. The code seems only shuffle the data once.

Liu0329 avatar Jun 29 '16 03:06 Liu0329

Yes, you are totally right. I should've corrected my codes but I was too busy nowadays. I will work on it as soon as possible.

Thank you for pointing the bugs out. -Taeksoo

jazzsaxmafia avatar Jun 29 '16 06:06 jazzsaxmafia

@Liu0329 @jazzsaxmafia Agree with Liu0329. Is there another bug in the next line: logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b should be replaced by logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b
+ tf.matmul(weighted_context, self.decode_lstm_image_W) +tf.matmul(word_emb, self.decode_lstm_word_W)

As (7) in the original papers, logits should be computed by W^T concat(weighted_context, word_emb, h) ?

RutgersHan avatar Aug 11 '16 21:08 RutgersHan

@RutgersHan Have you trained out a model that can be used ?

Liu0329 avatar Aug 12 '16 08:08 Liu0329

@Liu0329 I did not use attention model for image captioning. I used the attention model for another task, For me, the result is not very good. It is only a little better comparing the model without attention. I am not sure whether it is due to my implementation or something else. So that's why I am asking in the previous question, do you think there is bug for the logits computation in this implementation?

RutgersHan avatar Aug 12 '16 22:08 RutgersHan

@RutgersHan Yes, according to the paper the logits have 3 parts as you showed. Now I will try the training again. This repo seems unfinished, lots of parts simplified compared with the original code. We can help to improve it.

Liu0329 avatar Aug 20 '16 03:08 Liu0329

@Liu0329 @jazzsaxmafia @RutgersHan

Hello, I am currently reading the code and thanks for the post! For the 'alpha' calculation, aren't we suppose to use the context ('context_encode') and the previous hidden state? but in the implementation, it seems that it's summing up all the previous hidden state?

context_encode = context_encode +
tf.expand_dims(tf.matmul(h, self.hidden_att_W), 1) +
self.pre_att_b

Thanks!

2g-XzenG avatar Oct 30 '16 19:10 2g-XzenG

@Liu0329 @jazzsaxmafia @RutgersHan @1230pitchanqw Hello,I'm a undergraduate from Communication University Of China ,I'm currently working on this area for my graduation paper . Could you please tell me if you have worked out a satisfactory result based on this repo ? if it is ,how many epoch did you use to train out a good model? and what else did you change the original code? I would appreciate it if you can give me a reply ,thank you very much.

sjksong avatar Mar 03 '19 06:03 sjksong