OpenNMT-tf Coverage Mechanism and Coverage Loss

May I ask if there is any plan adding the coverage attention mechanism (https://arxiv.org/pdf/1601.04811.pdf) and coverage loss (https://arxiv.org/pdf/1704.04368.pdf) to the decoder, as these could potentially help alleviating the repetition problem in generation?

Or, any hints on a quick implementation? Thanks!

Jul 25 '18 00:07 wanghm92

There are no plans to add these features but contributions are welcome.

It is presently a bit complicated to customize the RNN decoder as we use the high-level tf.contrib.seq2seq APIs. We might want to revise that at some point.

Jul 25 '18 07:07 guillaumekln

@wanghm92 In case you are not aware, OpenNMT-py does support a training option called "coverage_attn" which I have used to solve a problem somewhat similar to yours.

My use case is for learning a strictly token-by-token mapping from the source sequence to the target sequence, which does not allow for any unwanted repetition or additional/missing tokens during the translation. This is hard to enforce under OpenNMT-tf，but so far OpenNMT-py seems to work well for my purposes.

Jul 25 '18 15:07 kaihuchen

@guillaumekln @kaihuchen Thanks a lot for the replies! I came across the discussion on the "coverage_attn" option from OpenNMT-py but also found this line in global attention.py : https://github.com/OpenNMT/OpenNMT-py/blob/fd1ec04758855008dbbf7ce1d56d16570544e616/onmt/modules/global_attention.py#L135-L142 Does that mean the coverage attention is still not supported yet? Or, @kaihuchen according to your experience the option indeed works? The same question was asked on the forum but has no response yet. http://forum.opennmt.net/t/whats-the-use-of-coverage-in-the-forward-pass-for-globalattention/1651 Could you give some hints? Thanks!

Jul 25 '18 16:07 wanghm92

@wanghm92 FYI, I have been trying out the coverage_attn feature in OpenNMT-py since just yesterday. What I have observed from my experiments so far are as follows:

If I add the '-coverage_attr' option for training, then in the inferred results the constraint len(TARGET_SEQ)>=len(SRC_SEQ) seems always hold true, and the token-for-token mapping seems much better behaved. This is not the case when I was using OpenNMT-tf. I have not traced into the source code so I cannot confirm whether this implies that the coverage_attn is fully functional as the designer intended.
In the above case I still see the repetition problem occasionally in the generated sequence (but still within the len constraint mentioned above). It is possible that this was because my model was still under-trained at the time when I sampled it.
There are some additional translate.py options, such as stepwise_penalty, coverage_penalty, length_penalty, etc. that seem relevant, but I have not played with them to know whether they are useful in this case or not.

Jul 26 '18 01:07 kaihuchen

@kaihuchen I see. I'm not sure if the developer forgot to delete the 'not supported' note or it is still under development. Would appreciate a clarification from the developers @guillaumekln if possible. Thank you very much for your detailed explanations! I'll go and try out those options myself and share with you my observations later.

Jul 26 '18 02:07 wanghm92

For any query about OpenNMT-py, please open issues to the dedicated repository. Thanks.

Jul 26 '18 07:07 guillaumekln

@guillaumekln

I see this discussion happened three years ago. Are there any plans to work on these features at the moment? Thank you!

Jul 01 '21 10:07 tmkhalil

There is no plan to work on this at the moment, but I would accept a PR adding these features.

Jul 01 '21 11:07 guillaumekln