OpenNMT-tf icon indicating copy to clipboard operation
OpenNMT-tf copied to clipboard

Coverage Mechanism and Coverage Loss

Open wanghm92 opened this issue 7 years ago • 8 comments

May I ask if there is any plan adding the coverage attention mechanism (https://arxiv.org/pdf/1601.04811.pdf) and coverage loss (https://arxiv.org/pdf/1704.04368.pdf) to the decoder, as these could potentially help alleviating the repetition problem in generation?

Or, any hints on a quick implementation? Thanks!

wanghm92 avatar Jul 25 '18 00:07 wanghm92

There are no plans to add these features but contributions are welcome.

It is presently a bit complicated to customize the RNN decoder as we use the high-level tf.contrib.seq2seq APIs. We might want to revise that at some point.

guillaumekln avatar Jul 25 '18 07:07 guillaumekln

@wanghm92 In case you are not aware, OpenNMT-py does support a training option called "coverage_attn" which I have used to solve a problem somewhat similar to yours.

My use case is for learning a strictly token-by-token mapping from the source sequence to the target sequence, which does not allow for any unwanted repetition or additional/missing tokens during the translation. This is hard to enforce under OpenNMT-tf,but so far OpenNMT-py seems to work well for my purposes.

kaihuchen avatar Jul 25 '18 15:07 kaihuchen

@guillaumekln @kaihuchen Thanks a lot for the replies! I came across the discussion on the "coverage_attn" option from OpenNMT-py but also found this line in global attention.py : https://github.com/OpenNMT/OpenNMT-py/blob/fd1ec04758855008dbbf7ce1d56d16570544e616/onmt/modules/global_attention.py#L135-L142 Does that mean the coverage attention is still not supported yet? Or, @kaihuchen according to your experience the option indeed works? The same question was asked on the forum but has no response yet. http://forum.opennmt.net/t/whats-the-use-of-coverage-in-the-forward-pass-for-globalattention/1651 Could you give some hints? Thanks!

wanghm92 avatar Jul 25 '18 16:07 wanghm92

@wanghm92 FYI, I have been trying out the coverage_attn feature in OpenNMT-py since just yesterday. What I have observed from my experiments so far are as follows:

  • If I add the '-coverage_attr' option for training, then in the inferred results the constraint len(TARGET_SEQ)>=len(SRC_SEQ) seems always hold true, and the token-for-token mapping seems much better behaved. This is not the case when I was using OpenNMT-tf. I have not traced into the source code so I cannot confirm whether this implies that the coverage_attn is fully functional as the designer intended.
  • In the above case I still see the repetition problem occasionally in the generated sequence (but still within the len constraint mentioned above). It is possible that this was because my model was still under-trained at the time when I sampled it.
  • There are some additional translate.py options, such as stepwise_penalty, coverage_penalty, length_penalty, etc. that seem relevant, but I have not played with them to know whether they are useful in this case or not.

kaihuchen avatar Jul 26 '18 01:07 kaihuchen

@kaihuchen I see. I'm not sure if the developer forgot to delete the 'not supported' note or it is still under development. Would appreciate a clarification from the developers @guillaumekln if possible. Thank you very much for your detailed explanations! I'll go and try out those options myself and share with you my observations later.

wanghm92 avatar Jul 26 '18 02:07 wanghm92

For any query about OpenNMT-py, please open issues to the dedicated repository. Thanks.

guillaumekln avatar Jul 26 '18 07:07 guillaumekln

@guillaumekln

I see this discussion happened three years ago. Are there any plans to work on these features at the moment? Thank you!

tmkhalil avatar Jul 01 '21 10:07 tmkhalil

There is no plan to work on this at the moment, but I would accept a PR adding these features.

guillaumekln avatar Jul 01 '21 11:07 guillaumekln