Paul Tardy

Results 34 comments of Paul Tardy

Actually the coverage mechanism isn't implemented for transformer decoders. Coverage comes from See 2018 which is based on RNNs instead (LSTM actually), therefore a single attention head. It's not clear...

Well at least it gives some guidelines to implement coverage in the Transformer. Feel free to implement this paper and open a PR we would review it. Results show some...

@Qnlp Absolutely. And better results as well. Transformers has many heads, has encoder, decoder AND cross-attention (instead of a single cross-attention layer in RNNs) so it may generalize the concept...

@mahimanzum sorry, didnt checked github notifs for a while. There's no option to do it directly in OpenNMT-py, it would require a bit of tweaks. First, the attention weights are...

@flauted do you get normal scores without coverage penalty? The problem is probably not about beta tho.

Could you try with the parameters of In particular using `-coverage_penalty summary`

Would be interesting to check predictions scores, in particular to look how much sentences get `inf` scores

It actually seems like a mistake to me, `a^t` being included in the summation we have `min(a^t, c^t) = a^t` which does not really make sense.

Ok it make sense. I found some results where the difference was around 9 rouge points (on 11.5k sentences) which is not close at all. I maybe did a mistake...

Sorry for delay, could you open a PR on that? Thanks