coref
coref copied to clipboard
Is there a bug in batch_gather?
in independent.py
top_fast_antecedent_scores = util.batch_gather(fast_antecedent_scores, top_antecedents) # [k, c]
sometimes return [NaN, NaN ...]
I tried to print the value of tensors using tf.Print()
batch_gather in util.py
def batch_gather(txt, emb, indices):
batch_size = shape(emb, 0)
seqlen = shape(emb, 1)
if len(emb.get_shape()) > 2:
emb_size = shape(emb, 2)
else:
emb_size = 1
flattened_emb = tf.reshape(emb, [batch_size * seqlen, emb_size]) # [batch_size * seqlen, emb]
offset = tf.expand_dims(tf.range(batch_size) * seqlen, 1) # [batch_size, 1]
gathered = tf.gather(flattened_emb, indices + offset) # [batch_size, num_indices, emb]
gathered = tf.Print(gathered, [gathered], message='gathered')
if len(emb.get_shape()) == 2:
gathered = tf.squeeze(gathered, 2) # [batch_size, num_indices]
gathered = tf.Print(gathered, [gathered], message=txt+'gathered2')
return gathered
The results are as follows.
emb == [[-inf -inf -inf...]...]
flattened_emb == [[-inf][-inf][-inf]...]
indice + offset == [[808 809 810...]...]
gathered == [[[nan][nan][nan]]...]
gathered2 == [[nan nan nan...]...]
Sometimes it doesn't happen(well done), but it happens frequently(training stops because Loss is NaN).
Can you help me..?
more error case:
emb[[-inf -inf -inf...]...]
emb_shape:[480 480]
offset[[0][480][960]...]
offset_shape:[480 1]
indices[[316 0 1...]...]
indices_shape:[480 50]
indices+offset:[[316 0 1...]...]
indices+offset_shape:[480 50]
flattened_emb[[-inf][-inf][-inf]...]
flattened_emb_shape:[230400 1]
gathered[[[nan][-inf][-inf]]...]
gathered_shape:[480 50 1]
IIRC, I don't think I changed that part of the code from the original e2e-coref. Are you seeing this on English OntoNotes? Possibly related to https://github.com/mandarjoshi90/coref/issues/6 ?