coval
coval copied to clipboard
Perl and Python implementation give different scores with duplicates
Take a file with duplicates:
#begin document (Dups);
test1 0 0 a1 (0)|(1)
test1 0 1 a2 -
test1 0 2 junk -
test1 0 3 b1 (1)
test1 0 4 b2 -
test1 0 5 b3 -
test1 0 6 b4 -
test1 0 7 jnk -
test1 0 8 . -
#end document
Cf. dups.txt
The two LEA implementations give different scores:
andreas@thinkpad:~/code/coval/% ~/src/reference-coreference-scorers/scorer.pl lea /tmp/dups.txt /tmp/dups.txt
version: 9.0.0-alpha /home/andreas/src/reference-coreference-scorers/lib/CorScorer.pm
====> (Dups);:
File (Dups);:
Entity 0: (0,0)
Entity 1: (0,0) (3,3)
====> (Dups);:
File (Dups);:
Entity 0: (0,0)
Entity 1: (0,0) (3,3)
(Dups);:
Repeated mention in the key: 0, 0 01
Repeated mention in the response: 0, 0 11
Total key mentions: 2
Total response mentions: 2
Strictly correct identified mentions: 2
Partially correct identified mentions: 0
No identified: 0
Invented: 1
Recall: (1 / 3) 33.33% Precision: (0 / 2) 0% F1: 0%
--------------------------------------------------------------------------
====== TOTALS =======
Identification of Mentions: Recall: (2 / 2) 100% Precision: (2 / 2) 100% F1: 100%
--------------------------------------------------------------------------
Coreference: Recall: (1 / 3) 33.33% Precision: (0 / 2) 0% F1: 0%
--------------------------------------------------------------------------
andreas@thinkpad:~/code/coval/% python3 scorer.py /tmp/dups.txt /tmp/dups.txt
Warning: A single mention is assigned to more than one cluster: [0, 1]
Warning: A single mention is assigned to more than one cluster: [0, 1]
recall precision F1
mentions 100.00 100.00 100.00
muc 100.00 100.00 100.00
bcub 100.00 100.00 100.00
ceafe 100.00 100.00 100.00
ceafm 100.00 100.00 100.00
lea 66.67 66.67 66.67
CoNLL score: 100.00
One would presume that all scores should be 100%, don't you agree?