DSTC10-MOD
DSTC10-MOD copied to clipboard
Why BLEU is greater than 1?
According to the results of table 4 and table 6 that you published in the paper Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Bechmark, the BLUE score is greater than 1. This is contrary to the definition of blue, which requires the values should be between 0 and 1. At the same time, I cheak the file task1_score.py and I didn't find the amplification factor multiplied in the Blue calculation. Look forward to your reply.
According to the results of table 4 and table 6 that you published in the paper Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Bechmark, the BLUE score is greater than 1. This is contrary to the definition of blue, which requires the values should be between 0 and 1. At the same time, I cheak the file task1_score.py and I didn't find the amplification factor multiplied in the Blue calculation. Look forward to your reply.
I notice that the authors incorrectly implemented BLEU score on the file task1_score.py
. Specifically, they computed corpus BLEU (corpus bleu from NLTK) on a single pair (inference, hypothesis), and then they averaged them over the corpus (divide by the length of samples). That was totally wrong
Btw I think that the BLEU benchmark in the paper is multiplied by 100 to achieve percentage number