CRSLab
CRSLab copied to clipboard
Bugs in evaluator
When doing the ind2txt
, we will get the string
:
Then if we calculate the
n-gram
, it will get the character granularity of unique n-gram
:
Example:
Correct:
code result:
Hi @Oran-Ac, follow the issue here I modified the gen_evaluate
by adding a simple split function. This would give me the correct tokens instead of characters.
def gen_evaluate(self, hyp, refs):
if hyp:
self.gen_metrics.add("f1", F1Metric.compute(hyp, refs))
for k in range(1, 5):
self.gen_metrics.add(f"bleu@{k}", BleuMetric.compute(hyp, refs, k))
# split sentence to tokens here
hyp_token = hyp.split()
for token in ngrams(hyp_token, k):
self.dist_set[f"dist@{k}"].add(token)
However, I am still unable to get reasonable dist@k compared to the original paper KGSF
. See here #44.
Did you reproduce similar results using the CRSlab toolbox, by any chance?
Thanks
Hi, @icedpanda, sorry for late reply. Thanks for your code, we will fix it soon.
I find the same phenomenon about the metric dist@k
. After discussing with the author of KGSF, we think that it is a normal phenomenon. With the increase of training epochs, the metric will quickly increase too. You don't need to worry about that.