OpenNMT-py coverage loss implementation might be wrong.

coverage loss implementation might be wrong.

Open sanghyuk-choi opened this issue 3 years ago • 2 comments

following abisee(2017), coverage attention equation is

but code implementation below, coverage seems containing t at timestep t, and it should be sum from 0 to t-1 https://github.com/OpenNMT/OpenNMT-py/blob/36748a5f280fbe781a86a82b7df85166796d49d7/onmt/decoders/decoder.py#L412-L415 so I think L413 would be replaced like this

if coverage is None:
    coverage = torch.zeros(p_attn.size(), device=p_attn.device)
else:
    coverage = coverage + attns["std"][-2]

https://github.com/OpenNMT/OpenNMT-py/blob/36748a5f280fbe781a86a82b7df85166796d49d7/onmt/utils/loss.py#L308-L311 and coverage loss would be replaced like this

def _compute_coverage_loss(self, std_attn, coverage_attn):
    covloss = torch.min(std_attn, coverage_attn).sum(dim=-1).view(-1)
    covloss *= self.lambda_coverage
    return covloss

# add ignoring pad at criterion
# coverage_loss[target==self.padding_idx] = 0

check if this correct and if necessary, I'll open a pull request. thank you.

Feb 22 '21 10:02 sanghyuk-choi

I think this is done on purpose, but @pltrdy may explain better.

Feb 22 '21 19:02 vince62s

It actually seems like a mistake to me, a^t being included in the summation we have min(a^t, c^t) = a^t which does not really make sense.

Feb 22 '21 20:02 pltrdy

@Sanghyuk-Choi if you're still around can you PR ?

Dec 07 '22 16:12 vince62s

Yes, I'll make PR within this week.

2022년 12월 8일 (목) 01:37, Vincent Nguyen @.***>님이 작성:

@Sanghyuk-Choi https://github.com/Sanghyuk-Choi if you're still around can you PR ?

—

Dec 08 '22 07:12 sanghyuk-choi

OpenNMT-py OpenNMT-py copied to clipboard

coverage loss implementation might be wrong.

OpenNMT-py
OpenNMT-py copied to clipboard