OpenNMT-py icon indicating copy to clipboard operation
OpenNMT-py copied to clipboard

Add bleu

Open coder1729 opened this issue 5 years ago • 9 comments

Added BLEU from AllenNLP instead of what @vince62s suggested as there tensors can be used directly and minibatches with validation would work. In @vince62s 's reference of corpus_bleu, entire validation set would be required. Also this already excluded the pad tokens.

https://github.com/OpenNMT/OpenNMT-py/issues/1158

coder1729 avatar Jan 09 '19 07:01 coder1729

Thanks for the contribution, however you will need to change the logic. As a first PR you just need to add Bleu as an extra validation metric (look at how accuracy and ppl are done in the validation process). You don't need to change the loss related functions. Maybe later on we can change the loss fucntion itself to implement other things but this is not the point at this stage.

vince62s avatar Jan 09 '19 08:01 vince62s

@vince62s so do you want all the members of the bleu class to be included in the Statistics class and updations to precision_matches etc as a part of the batch_stats update?

coder1729 avatar Jan 10 '19 06:01 coder1729

no, the class in its own file is fine. I am just saying that Bleu is another metric at the same level as PPL or ACC that's it.

vince62s avatar Jan 10 '19 07:01 vince62s

@vince62s please see how it looks now. Thanks !

coder1729 avatar Jan 10 '19 08:01 coder1729

@vince62s to remove batch size a small refactor is required. Please have a look, also moved to bleupy

coder1729 avatar Jan 13 '19 05:01 coder1729

Hi. It's really nice to have this feature. I tried this and I have a silly question about it. Applying this feature, I witnessed very small bleu score on the dev set during training. However, when I used the same models to translate the same dev set, and calculated the bleu score of the results, I found that the bleu scores were much higher (and more reasonable), even with the beam size = 1. I am trying to understand why it appears to be different in these cases, but I don't have the answer for now. I wonder if I'm wrong somewhere or do you guys see the same problem? Do you have any idea about this?

mhn226 avatar Mar 25 '19 16:03 mhn226

Hi again, I tried to investigate this problem and I found that the BLEU score varied greatly when the valid_batch_size changed significantly. In details, when increasing the valid_batch_size, the number of predicted tokens seemed to increase, and consequently, the precision_totals also increased (true for every kind of n-grams). Whereas, the precision_matches didn't seem to be dependent on the valid_batch_size. The count clip seemed to work fine here! So to summary, when the valid_batch_size increases, for every kind of n-grams, the precision_totals increases significantly, while the precision_matches stays the same, consequently, this obviously decreases the bleu score. I prefer to set valid_batch_size = 1 in this case.

mhn226 avatar Mar 26 '19 18:03 mhn226

Any update on this one? Would be great to have this. Considering that it is implemented in the TF version, some inspiration can be found there.

Preferably the BLEU implementation that is used should be identical between the TF and PT version of onmt (!)

BramVanroy avatar Jul 27 '20 11:07 BramVanroy

You can display the bleu scores in tensorboard by adding another scalar at the end of statistics.py!

sajeedmehrab avatar Aug 22 '20 10:08 sajeedmehrab