dmn-tensorflow
dmn-tensorflow copied to clipboard
softmax in def attention() is wrong?
Hi, first of all, thank you very much for the code, I learnt a lot!
In models/new/episode_module.py, line 59, l2 = tf.nn.softmax(l2) is performed on batch/N dimension but in paper https://arxiv.org/pdf/1603.01417v1.pdf formula (10), the softmax should be performed on Fact/F dimension. Is it the case?
Please let me know. Thank you!
Best regards,
A fix could be comment out line 59, l2 = tf.nn.softmax(l2) and revise the new() as follows
def new(self, memory):
""" Creates new episode vector (will feed into Episodic Memory GRU)
:param memory: Previous memory vector
:return: episode vector
"""
state = self.init_state
memory = tf.transpose(memory) # [N, D]
with tf.variable_scope('AttnGate') as scope:
gs = []
for f, f_t in zip(self.facts, self.facts_transposed):
g = self.attention(f, memory)
gs.append(g)
gs = tf.pack(gs)
gs = tf.nn.softmax(gs, dim=0)
gs = tf.unpack(gs)
with tf.variable_scope('AttnGate_update') as scope:
for f, f_t, g in zip(self.facts, self.facts_transposed, gs):
state = self.gru(f_t, state, g)
scope.reuse_variables() # share params
return state
Yes it is wrong. The paper says we should get the normalize distribution over attentions at each outputs from the fusion layer .