Multi-modal-Circulant-Fusion
Multi-modal-Circulant-Fusion copied to clipboard
Need some clarity in the code
In line 41 and 42 of fusion.py file Instead of this fusion_visual = tf.reduce_mean(visual_vector * visual_f, 1) fusion_text = tf.reduce_mean(text_vector * text_f, 1) Shouldn't it be fusion_visual = tf.reduce_mean(text_vector * visual_f, 1) fusion_text = tf.reduce_mean(visual_vector * text_f, 1)
Since we want to generate sentence aware visual embeddings ....
Yes, it should be fusion_visual = tf.reduce_mean(text_vector * visual_f, 1) and fusion_text = tf.reduce_mean(visual_vector * text_f, 1). Please see the line 53 and 54 of fusion.py.
Thank you very much.