pointer-generator
pointer-generator copied to clipboard
The question about coverage loss
Recently, I implement your project to process another dataset, especially the coverage loss, but I can't reproduce the ability to avoid the repetition, I'm sure I have been in accordance with your instruction in your paper. so I want to know is there any note I need to know. As we can see in the coverage loss, "sum(min(a,c))". If the length of decode is long enough, c may be vector filled with the value bigger than 1, then the "sum(min(a,c))" may be 1 forever. so is there some improvement.
Yes, the coverage vector c increases monotonically and is unbounded. It is possible for sum(min(a,c)) to be 1 forever if the attention a always attends to something that's already been covered. This is by design.
@abisee Could you please elaborate on the intuition behind this coverage loss? I didn't read code carefully, but according to the paper, this loss acts like a regularization item along with NLL and we try to minimize both. But how does this help to avoid repetition?
@Gandor26 For intuition see the blog post.