Is there a bug here?
https://github.com/mandarjoshi90/coref/blob/bdd15253d174a6a9e155b578ea8e53a46a9aff4c/independent.py#L231
Hi, I guess an offset can only be the first term, not the subtraction.
Apologies for the late response. IIRC, I don't think I changed that part of the code from e2e-coref. You could be right, though. At a quick glance, it would seem that it's computing distances between indices of the span pairs. That should still be fine for the mask in the following line but less so for the distances.
Hi,
Thanks for the reply.
I was worried about the negative values in antecedent_offsets.
Maybe we only need to consider the positive ones (they are real distances), and the negative ones will be masked out later on?
Right. The bucket_distance function will mask out the negative values.
Hi,
I'm still confused after taking another close look. My understanding is that top_antecedent_offsets will contains negative values, since antecedent_offsets has negative values.
As a result, there will be some unexpected behavior within bucket_distance function, as it calculates the log value of top_antecedent_offsets.
Also, I think maybe the top_fast_antecedent_scores in
https://github.com/mandarjoshi90/coref/blob/master/independent.py#L324 needs to be updated along with the loop, since top_span_emb gets updated after every iteration.
Maybe? I suspect it doesn't matter since the slow scores are doing the heavy lifting. Happy to accept a PR though if you're seeing an improvement with that change :)