torcheval
torcheval copied to clipboard
fix: correct reference length calculation
Summary
This PR fixes the way brevity penalty (specifically the effective reference corpus length) is calculated in BLEU.
Previously, len_reference
was calculated as min([len(ref) for ref in references_tokenized])
. However, this is incorrect, because according to the paper, we need to find the "best match length", not the minimum reference length.
For more information, see wikipedia - brevity penalty and nltk implementation.
Test plan
I added another unit test to test_bleu.py
and compared the results of the calculations to the results of the nltk.translate.bleu_score.corpus_bleu
function to make sure the implementation is correct.
@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Hi, I am wondering if there is a way for me to get the reason why the test is failing so that I can fix the problem.
Hi @yuxqiu, thanks for this contribution! it seems some files unrelated to BLEU have been formatted in a way in which causes our linter to error, do you mind undo-ing those changes?
@JKSenthil I've finished undo-ing all those changes.
@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Hi @yuxqiu, thanks for reverting! We have identified the linter issue to be on our end, we'll land a fix first then rerun these tests again :)
@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.