nlg-eval icon indicating copy to clipboard operation
nlg-eval copied to clipboard

Fix calculation error when ref is empty

Open voidful opened this issue 5 years ago • 3 comments

This pull request fixing the issue that any of the ref is empty

When the number of ref is inconsistent, i will fill a empty string as padding. which causing an error.

scores = n.compute_metrics(ref_list=[
            [
                "this is one reference sentence for sentence1",
                ""
            ],
            [
                "this is one more reference sentence for sentence1",
                "this is the second reference sentence for sentence2"
            ],
        ],
            hyp_list=[
                "this is the model generated sentence1 which seems good enough",
                "this is sentence2 which has been generated by your model"
            ]
        )

voidful avatar Jan 05 '20 08:01 voidful

CLA assistant check
All CLA requirements met.

msftclas avatar Jan 05 '20 08:01 msftclas

Thanks for pointing this out. The references are the targets that the generated hypothesis should match. It's possible that a target would indeed be an empty string so I think we should correct what is actually causing the error instead of silently ignoring empty strings which could mean that a hypothesis is compared with a target that it wasn't meant to be compared with. For example:

References:

  1. "Sentence 1"
  2. "" (this one would get filtered out)
  3. "Sentence 3"

Hypotheses:

  1. "Sentence 1"
  2. "Sentence 2"
  3. "Sentence 3"

juharris avatar Jan 05 '20 21:01 juharris

I agree that we should correct what is actually causing the error To clear this in more general way: when one of the ref is empty or hyp is empty

ref=["this is a test",""],
hyp="this is a good test"
ref=["this is a good test"],
hyp=""

vectorize metric(Skip-thought/ glove_metrics) will cause error due to empty input to encode so the following commit will try to correct it.

voidful avatar Jan 06 '20 15:01 voidful