tensor2tensor
tensor2tensor copied to clipboard
[Question] Inadequate translation
NMT is better than SMT in fluency, while it suffers from inadequate translation for long sentences. I have come across research where coverage is modelled for NMT. Does Transformer have better solutions for this (optimal setting, etc)?
In this paper https://arxiv.org/pdf/1609.08144.pdf, based on beam search algorithm, they includes a coverage penalty to favor translations that fully cover the source sentence according to the attention module.The scoring function s(Y, X) that we employ to rank candidate translations is defined asfollows: (Equation 14 in paper page 12)
s(Y, X) = log(P(Y|X))/lp(Y ) + cp(X; Y )
The first part of s(Y, X) is length normalization, found it here in transformer, but for the second part cp(X; Y ) which means coverage penalty, I didn't find which piece of code is implementing this function in transformer decoder. Does Transformer have better solutions for this?
@crystal0913: I think inadequate translation of NMT is a common problem in the community. Adding penalty can work but not a complete solution. I am thinking of hybrid translation with statistical MT.
Is there any schedule on adding beta for coverage penalty?