BlonDe icon indicating copy to clipboard operation
BlonDe copied to clipboard

How to score with the new BWB?

Open mjpost opened this issue 1 year ago • 1 comments

I would like to score an output on blonde categories, but it's not quite clear how to do this.

The top-level README describes an annotation format, but that doesn't fit the format of the BWB files: there are no ner_re.txt more an.txt files, just files of the form {section}.ref_re.an.txt. Furthermore there does not appear to be any documentation of the embedded new format.

I suppose I should use BWBReader class to dump the annotation format used by BlonDe. This seems straightforward with the nice reader, but it also introduces an element of error and interpretation that means I can't compare scores to those reported in the paper. It is not clear exactly what you did to compute scores, beyond the note in Table 7 that you computed both F1 and exact-match. For example, are stats computed at the sentence or document level, etc.

Can you share example code that parses test_with_annotations to the format used with BlonDe?

Ideally I would just like to do the following to compare scores:

# create input file from BWB/BWB_dataset/test_with_annotations
your_command > input.txt
your_command > ref.txt
your_command > annotations.txt
# (or whatever)

# translate it
cat input.txt | my_decoder > output.txt

# score it with blonde (for all five cats: ambiguity, entity, tense, pronoun, ellipsis)
blonde -r ref.txt -s output.txt -p -ner annotations.txt

(Even more ideally, you would provide a system output from the paper, so that I could verify that I have the correct results.)

mjpost avatar Dec 04 '23 22:12 mjpost