OpenNMT-py
OpenNMT-py copied to clipboard
coverage loss implementation might be wrong.
following abisee(2017), coverage attention equation is
but code implementation below, coverage
seems containing t
at timestep t, and it should be sum from 0
to t-1
https://github.com/OpenNMT/OpenNMT-py/blob/36748a5f280fbe781a86a82b7df85166796d49d7/onmt/decoders/decoder.py#L412-L415
so I think L413 would be replaced like this
if coverage is None:
coverage = torch.zeros(p_attn.size(), device=p_attn.device)
else:
coverage = coverage + attns["std"][-2]
https://github.com/OpenNMT/OpenNMT-py/blob/36748a5f280fbe781a86a82b7df85166796d49d7/onmt/utils/loss.py#L308-L311 and coverage loss would be replaced like this
def _compute_coverage_loss(self, std_attn, coverage_attn):
covloss = torch.min(std_attn, coverage_attn).sum(dim=-1).view(-1)
covloss *= self.lambda_coverage
return covloss
# add ignoring pad at criterion
# coverage_loss[target==self.padding_idx] = 0
check if this correct and if necessary, I'll open a pull request. thank you.
I think this is done on purpose, but @pltrdy may explain better.
It actually seems like a mistake to me, a^t
being included in the summation we have min(a^t, c^t) = a^t
which does not really make sense.
@Sanghyuk-Choi if you're still around can you PR ?
Yes, I'll make PR within this week.
2022년 12월 8일 (목) 01:37, Vincent Nguyen @.***>님이 작성:
@Sanghyuk-Choi https://github.com/Sanghyuk-Choi if you're still around can you PR ?
—