Paul Tardy comments

Results 34 comments of


                                            Paul Tardy

"AssertionError: lambda_coverage != 0.0 requires coverage attention" when coverage_attn is already set in the parameter

Actually the coverage mechanism isn't implemented for transformer decoders. Coverage comes from See 2018 which is based on RNNs instead (LSTM actually), therefore a single attention head. It's not clear...

"AssertionError: lambda_coverage != 0.0 requires coverage attention" when coverage_attn is already set in the parameter

Well at least it gives some guidelines to implement coverage in the Transformer. Feel free to implement this paper and open a PR we would review it. Results show some...

"AssertionError: lambda_coverage != 0.0 requires coverage attention" when coverage_attn is already set in the parameter

@Qnlp Absolutely. And better results as well. Transformers has many heads, has encoder, decoder AND cross-attention (instead of a single cross-attention layer in RNNs) so it may generalize the concept...

Copy Mechanism Visualization

@mahimanzum sorry, didnt checked github notifs for a while. There's no option to do it directly in OpenNMT-py, it would require a bit of tweaks. First, the attention weights are...

Average pred scores are -inf with coverage penalty

@flauted do you get normal scores without coverage penalty? The problem is probably not about beta tho.

Average pred scores are -inf with coverage penalty

Could you try with the parameters of In particular using `-coverage_penalty summary`

Average pred scores are -inf with coverage penalty

Would be interesting to check predictions scores, in particular to look how much sentences get `inf` scores

coverage loss implementation might be wrong.

It actually seems like a mistake to me, `a^t` being included in the summation we have `min(a^t, c^t) = a^t` which does not really make sense.

About ROUGE scores

Ok it make sense. I found some results where the difference was around 9 rouge points (on 11.5k sentences) which is not close at all. I maybe did a mistake...

Recursion depth exceeded in comparison at ~1000 tokens

Sorry for delay, could you open a PR on that? Thanks