CRSLab Bugs in evaluator

Bugs in evaluator

Open Oran-Ac opened this issue 2 years ago • 1 comments

When doing the ind2txt, we will get the string: Then if we calculate the n-gram, it will get the character granularity of unique n-gram: Example: Correct: code result:

Mar 08 '22 02:03 Oran-Ac

Hi @Oran-Ac, follow the issue here I modified the gen_evaluate by adding a simple split function. This would give me the correct tokens instead of characters.

def gen_evaluate(self, hyp, refs):
    if hyp:
        self.gen_metrics.add("f1", F1Metric.compute(hyp, refs))

        for k in range(1, 5):
            self.gen_metrics.add(f"bleu@{k}", BleuMetric.compute(hyp, refs, k))
            # split sentence to tokens here
            hyp_token = hyp.split()
            for token in ngrams(hyp_token, k):
                self.dist_set[f"dist@{k}"].add(token)

However, I am still unable to get reasonable dist@k compared to the original paper KGSF. See here #44.

Did you reproduce similar results using the CRSlab toolbox, by any chance?

Thanks

Jun 13 '22 12:06 icedpanda

Hi, @icedpanda, sorry for late reply. Thanks for your code, we will fix it soon. I find the same phenomenon about the metric dist@k. After discussing with the author of KGSF, we think that it is a normal phenomenon. With the increase of training epochs, the metric will quickly increase too. You don't need to worry about that.

Aug 22 '22 04:08 wxl1999

CRSLab CRSLab copied to clipboard

Bugs in evaluator

CRSLab
CRSLab copied to clipboard