gerbil
gerbil copied to clipboard
Strange RT2KB and Typing scores
Hello,
The RT2KB and Typing process gives strange scores compared to other scorers. Every time I run a RT2KB process, on a NIF dataset, I always get the exact same score for Precision, Recall and F1, which is quite odd (see this example). If I evaluate the same output with two other scorers (neleval and conlleval) I get the same results with both scorers that are much higher than what RT2KB can gives me (P = 0.717, R = 0.765, F1 = 0.740).
The description of RT2KB says "the annotator gets a text and shall recognize the entities inside and their types", consequently I'm curious to know how the three measures can be equals for Typing when they are different for Recognition.
Any light on this would be welcomed :)
Thanks!
Thanks for that question. I can only give a general answer since you have uploaded a larger dataset. I think uploading an example with a single document for which the evaluation results are different would give use an easier way of comparing the evaluations :wink:
In general, RT2KB does the following:
- it identifies entities that have been recognized correctly (Recognition step)
- from these correctly identified entities it takes the types and calculates the hierarchical F-measure for the type. (Errors in the recognition will lead to lower precision/recall in this calculation as well, since expected type information won't be available, etc.)
From the results for these two single steps, you can see that the benchmarked system has got 0.76 F1-measure for each step. So the combination of both can not get more than that and most probably will have a lower F1-measure since correctly identified entities might have got a (partly) wrong type.
However, I would be happy to dig into this when you can provide a single example with the results from the other scorers :smiley:
Thanks @MichaelRoeder I will check with a single document with more specific details on how to reproduce this with Gerbil and the two other scorers ASAP and share it on this thread :)
The results with the conlleval scorer was a happy coincidence, because it does not evaluate by "offset" but by "token", so the way it evaluates the recognition is different. Sorry for that.
However, the neleval scorer has a similar behavior than RT2KB and still proposes a different result over this single document. Here the GERBIL results and here the TAC output (understood by the neleval scorer):
Gold Standard in TAC:
document-75 0 14 NIL0 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 33 38 http://dbpedia.org/resource/Paris 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75 74 77 NIL0 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 92 94 NIL0 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 116 132 NIL1 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Organization
document-75 136 156 http://dbpedia.org/resource/Thessaloniki 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75 158 161 NIL0 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 162 168 http://dbpedia.org/resource/Mother 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75 170 184 NIL2 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 205 212 http://dbpedia.org/resource/Actor 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75 227 241 NIL3 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
The equivalent in NIF:
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix oke: <http://aksw.org/notInWiki/> .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242>
a nif:String, nif:RFC5147String, nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "242"^^xsd:nonNegativeInteger ;
nif:isString "Albert Modiano (1912–77, born in Paris), was of Italian Jewish origin; on his paternal side he was descended from a Sephardic family of Thessaloniki, Greece. His mother, Louisa Colpijn (1918-2015), was an actress also known as Louisa Colpeyn."@en .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,14>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "Albert Modiano"@en ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "14"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef oke:Albert_Modiano ;
itsrdf:taClassRef dul:Person ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,38>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "Paris"@en ;
nif:beginIndex "33"^^xsd:nonNegativeInteger ;
nif:endIndex "38"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef dbpedia:Paris ;
itsrdf:taClassRef dul:Place ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=74,77>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "his"@en ;
nif:beginIndex "74"^^xsd:nonNegativeInteger ;
nif:endIndex "77"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef oke:Albert_Modiano ;
itsrdf:taClassRef dul:Person ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=92,94>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "he"@en ;
nif:beginIndex "92"^^xsd:nonNegativeInteger ;
nif:endIndex "94"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef oke:Albert_Modiano ;
itsrdf:taClassRef dul:Person ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=116,132>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "Sephardic family"@en ;
nif:beginIndex "116"^^xsd:nonNegativeInteger ;
nif:endIndex "132"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef oke:Sephardi_family ;
itsrdf:taClassRef dul:Organization ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=136,156>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "Thessaloniki, Greece"@en ;
nif:beginIndex "136"^^xsd:nonNegativeInteger ;
nif:endIndex "156"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef dbpedia:Thessaloniki ;
itsrdf:taClassRef dul:Place ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=158,161>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "His"@en ;
nif:beginIndex "158"^^xsd:nonNegativeInteger ;
nif:endIndex "161"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef oke:Albert_Modiano ;
itsrdf:taClassRef dul:Person ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=162,168>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "mother"@en ;
nif:beginIndex "162"^^xsd:nonNegativeInteger ;
nif:endIndex "168"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef dbpedia:Mother ;
itsrdf:taClassRef dul:Role ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=170,184>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "Louisa Colpijn"@en ;
nif:beginIndex "170"^^xsd:nonNegativeInteger ;
nif:endIndex "184"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef oke:Louisa_Colpijn ;
itsrdf:taClassRef dul:Person ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=205,212>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "actress"@en ;
nif:beginIndex "205"^^xsd:nonNegativeInteger ;
nif:endIndex "212"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef dbpedia:Actor ;
itsrdf:taClassRef dul:Role ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=227,241>
a nif:String, nif:RFC5147String, nif:Phrase ;
nif:anchorOf "Louisa Colpeyn"@en ;
nif:beginIndex "227"^^xsd:nonNegativeInteger ;
nif:endIndex "241"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taIdentRef oke:Louisa_Colpeyn ;
itsrdf:taClassRef dul:Person ;
itsrdf:taSource "DBpedia 2014"^^xsd:string .
System output in TAC:
document-75 170 184 http://dbpedia.org/resource/National_Register_of_Historic_Places_listings_in_Iowa 5.4756873E-7 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 136 156 http://dbpedia.org/resource/Greece 1.4326925E-5 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75 0 14 http://dbpedia.org/resource/University_of_Chicago 5.789066E-6 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 116 135 http://dbpedia.org/resource/Family_(biology) 3.2394513E-5 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Organization
document-75 205 212 http://dbpedia.org/resource/Actor 2.6748134E-5 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75 33 39 http://dbpedia.org/resource/Paris 4.2364663E-5 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75 158 161 http://dbpedia.org/resource/Hit_(baseball) 2.1313697E-6 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 92 94 http://dbpedia.org/resource/Netherlands 1.5448735E-5 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 74 86 http://dbpedia.org/resource/Rhineland-Palatinate 4.3240807E-6 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 227 233 http://dbpedia.org/resource/List_of_Animaniacs_characters 4.727223E-7 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75 234 241 NILfbc8560d-e7b1-4207-8856-0de7b142075f 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 162 168 http://dbpedia.org/resource/Scotland 2.1532596E-5 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
Here the equivalent NIF output:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=170,184>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Louisa Colpijn" ;
nif:beginIndex "170"^^xsd:nonNegativeInteger ;
nif:endIndex "184"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Person .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=92,94>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "he" ;
nif:beginIndex "92"^^xsd:nonNegativeInteger ;
nif:endIndex "94"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Person .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,14>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Albert Modiano" ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "14"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Person .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=136,156>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Thessaloniki, Greece" ;
nif:beginIndex "136"^^xsd:nonNegativeInteger ;
nif:endIndex "156"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Place .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=33,39>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Paris)" ;
nif:beginIndex "33"^^xsd:nonNegativeInteger ;
nif:endIndex "39"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Place .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=234,241>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Colpeyn" ;
nif:beginIndex "234"^^xsd:nonNegativeInteger ;
nif:endIndex "241"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Person .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=158,161>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "His" ;
nif:beginIndex "158"^^xsd:nonNegativeInteger ;
nif:endIndex "161"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Person .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=205,212>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "actress" ;
nif:beginIndex "205"^^xsd:nonNegativeInteger ;
nif:endIndex "212"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Role .
<http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242>
a nif:String , nif:RFC5147String , nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "242"^^xsd:nonNegativeInteger ;
nif:isString "Albert Modiano (1912–77, born in Paris), was of Italian Jewish origin; on his paternal side he was descended from a Sephardic family of Thessaloniki, Greece. His mother, Louisa Colpijn (1918-2015), was an actress also known as Louisa Colpeyn."@en .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=74,86>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "his paternal" ;
nif:beginIndex "74"^^xsd:nonNegativeInteger ;
nif:endIndex "86"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Person .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=227,233>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Louisa" ;
nif:beginIndex "227"^^xsd:nonNegativeInteger ;
nif:endIndex "233"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Role .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=162,168>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "mother" ;
nif:beginIndex "162"^^xsd:nonNegativeInteger ;
nif:endIndex "168"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Role .
<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=116,135>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Sephardic family of" ;
nif:beginIndex "116"^^xsd:nonNegativeInteger ;
nif:endIndex "135"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
itsrdf:taClassRef dul:Organization .
The scorer is available here and the command line to run the evaluation is:
./nel evaluate -m strong_typed_mention_match -f tab -g gold_standard.tac system_output.tac
And here the output I get:
ptp fp rtp fn precis recall fscore measure
7 5 7 4 0.583 0.636 0.609 strong_typed_mention_match
It seems to me that the neleval output:
ptp fp rtp fn precis recall fscore measure
7 5 7 4 0.583 0.636 0.609 strong_typed_mention_match
corresponds to the "Entity Recognition" score provided by GERBIL at http://gerbil.aksw.org/gerbil/experiment?id=201711270022
However, the strong_typed_mention_match SHOULD correspond to the "Entity Typing". Is this the issue?
No it should correspond to the first line where 0,4375 | 0,4375 | 0,4375
is written. Entity Typing is something else.
Basically "strong_typed_mention_match" in neleval == "RT2KB" in GERBIL and "strong_mention_match" in neleval == "Entity Recognition" in GERBIL.
The example I gave is a case where the score of extraction is equal to the score of recognition because all the 7 (up to 11 in total) correct extracted mentions have their proper type attached. Look at the "TP", "FN" and "FP" values they are equals:
./nel evaluate -m strong_mention_match -f tab -g gold_standard.tac system_output.tac
ptp fp rtp fn precis recall fscore measure
7 5 7 4 0.583 0.636 0.609 strong_mention_match
./nel evaluate -m strong_typed_mention_match -f tab -g gold_standard.tac system_output.tac
ptp fp rtp fn precis recall fscore measure
7 5 7 4 0.583 0.636 0.609 strong_mention_match
@jplu thanks for this example. Going through it manually, I have calculated the same result as the nel-eval
script.
GS start | GS length | GS URI | GS type | Sys start | Sys length | Sys type | Erec Matching | hier. Matching |
---|---|---|---|---|---|---|---|---|
0 | 14 | aksw:Albert_Modiano | dul:Person | 0 | 14 | dul:Person | tp | tp |
33 | 5 | dbr:Paris | dul:Place | 33 | 6 | dul:Place | fp, fn | fp, fn |
74 | 3 | aksw:Albert_Modiano | dul:Person | 74 | 12 | dul:Person | fp, fn | fp, fn |
92 | 2 | aksw:Albert_Modiano | dul:Person | 92 | 2 | dul:Person | tp | tp |
116 | 16 | aksw:Sephardi_family | dul:Organization | 116 | 19 | dul:Organization | fp, fn | fp, fn |
136 | 20 | dbr:Thessaloniki | dul:Place | 136 | 20 | dul:Place | tp | tp |
158 | 3 | aksw:Albert_Modiano | dul:Person | 158 | 3 | dul:Person | tp | tp |
162 | 6 | dbr:Mother | dul:Role | 162 | 6 | dul:Role | tp | tp |
170 | 14 | aksw:Louisa_Colpijn | dul:Person | 170 | 14 | dul:Person | tp | tp |
205 | 7 | dbr:Actor | dul:Role | 205 | 7 | dul:Role | tp | tp |
227 | 14 | aksw:Louisa_Colpeyn | dul:Person | 227 | 6 | dul:Role | fp, fn | fp, fn |
--- | --- | --- | --- | 234 | 7 | dul:Person | fp | fp |
These numbers lead to precision=0.583, recall=0.636 and F1-score=0.609.
So what I gathered so far is that GERBIL identifies the cases as they are described in the table above. However, the numbers that are calculated based on these counts are not correct. We will search for the problem and update GERBIL.
Thanks @MichaelRoeder! Let me know once the bug will be fixed.
Hi,
sorry it took me so long. Much todo right now.
Is there an open endpoint or could you provide me the ADEL webservice url? (here or DM) It would be much easier for me to check against the actual WebService.
@TortugaAttack I have reproduced the problem using the two NIF files listed above. You can use the FileBasedNIFDataset for loading the data and the InstanceListBasedAnnotator to load the result file of the annotator and simulate the behaviour of an annotator (you have to make sure that the URIs of the documents in both files are the same - I think the annotator result NIF above has a different URI for the document, the needs to be replaced).
Based on that, you should add a JUnit test (you can copy and adapt the SingleRunTest for that).
Well. I found a problem. In Hier:
If the annotator provides wrong results: f.e.:
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Paris)" ;
nif:beginIndex "33"^^xsd:nonNegativeInteger ;
nif:endIndex "39"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taClassRef dul:Place .
It will be count as tp=0, fp=1, fn=0
By removing the part where annotations not in the golden std. the results matches yours.
I guess it is debatable here if ETyping should acknowledge Recognition too. I can remove it and everything would be matching the results. or let in there and we should provide this information in the wiki. Unit test will be changed accroding on what it should be
I do not see how this solves the issue since we have to count it as a false positive - as it is done in the table above as well. However, if it solved the problem for you, it might be possible that we count it twice... right?
no it is not done in the table above. In the table above you havve the 11 entities which are in the golden std. (and one with -- i am not sure what you mean by that) In Gerbil we have currently 16. 11 Golden std (which are counted correct according to the table) + 5 from the annotator which are not in the golden std.
Again: F.e.
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Paris)" ;
nif:beginIndex "33"^^xsd:nonNegativeInteger ;
nif:endIndex "39"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taClassRef dul:Place .
Is not counted in your table.
If we ignore those entities not in the golden std. we get the results you calculated.
If not we get other results.
The table is structured by the gold std. entities (11) and the entities of the system answers (12) mapped to them. The last system answer does not match any gold standard (that is the reason for the ---). Apart from that, there are 4 entities from the system that are not exactly matching the gold std (like the "Paris)" example you described). So the table does contain 16 distinct entities :wink:
-
We fixed a bug in the hierarchical F1-measure counting that could lead to doubling the number of fp counts.
-
Apart from that, there is a misunderstanding in the calculation of the hierarchical F-measure and the table that I posted before shows exactly the misunderstanding: when evaluating the results of an annotation system, the evaluation can not match "Paris" and "Paris)" as we have done it in the table above. For a human we would automatically put them in the same line but for the evaluation, these two entities are different and have to be handled separatetly. Thus, the updated table looks like the following.
GS start | GS length | GS URI | GS type | Sys start | Sys length | Sys type | Erec Matching | hier. Matching | hier. prec | hier. recall | hier. F1 |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14 | aksw:Albert_Modiano | dul:Person | 0 | 14 | dul:Person | tp | tp | 1.0 | 1.0 | 1.0 |
33 | 5 | dbr:Paris | dul:Place | --- | --- | --- | fn | fn | 0.0 | 0.0 | 0.0 |
74 | 3 | aksw:Albert_Modiano | dul:Person | 74 | 12 | dul:Person | fn | fn | 0.0 | 0.0 | 0.0 |
92 | 2 | aksw:Albert_Modiano | dul:Person | 92 | 2 | dul:Person | tp | tp | 1.0 | 1.0 | 1.0 |
116 | 16 | aksw:Sephardi_family | dul:Organization | 116 | 19 | dul:Organization | fn | fn | 0.0 | 0.0 | 0.0 |
136 | 20 | dbr:Thessaloniki | dul:Place | 136 | 20 | dul:Place | tp | tp | 1.0 | 1.0 | 1.0 |
158 | 3 | aksw:Albert_Modiano | dul:Person | 158 | 3 | dul:Person | tp | tp | 1.0 | 1.0 | 1.0 |
162 | 6 | dbr:Mother | dul:Role | 162 | 6 | dul:Role | tp | tp | 1.0 | 1.0 | 1.0 |
170 | 14 | aksw:Louisa_Colpijn | dul:Person | 170 | 14 | dul:Person | tp | tp | 1.0 | 1.0 | 1.0 |
205 | 7 | dbr:Actor | dul:Role | 205 | 7 | dul:Role | tp | tp | 1.0 | 1.0 | 1.0 |
227 | 14 | aksw:Louisa_Colpeyn | dul:Person | 227 | 6 | dul:Role | fn | fn | 0.0 | 0.0 | 0.0 |
--- | --- | --- | --- | 33 | 6 | dul:Place | fp | fp | 0.0 | 0.0 | 0.0 |
--- | --- | --- | --- | 74 | 12 | dul:Place | fp | fp | 0.0 | 0.0 | 0.0 |
--- | --- | --- | --- | 116 | 19 | dul:Organization | fp | fp | 0.0 | 0.0 | 0.0 |
--- | --- | --- | --- | 227 | 6 | dul:Role | fp | fp | 0.0 | 0.0 | 0.0 |
--- | --- | --- | --- | 234 | 7 | dul:Person | fp | fp | 0.0 | 0.0 | 0.0 |
For the recognition of entities, there is no difference since we can simply sum up the tp, fp and fn counts. However, for the hierarchical F-measure, this is not possible. When evaluating the typing, we have to compare trees/hierarchies of types which can lead to more than one tp, fp or fn per comparison. Since we want to handle the single entities equal, GERBIL calculates the precision, recall and F1-measure for every entity (can be found in the table above). The averages of these values are the precision, recall and F1-measure scores for the complete document (for the example above, it is precision=7/16, recall=7/16 and F1-score=7/16).
@jplu @rtroncy I know it is not the most intuitive implementation :smiley:. It is arguable whether it is okay to have a "missed" entity not only counted as fn but as precision and recall = 0 and count the (nearly matching) fp entity again with precision and recall = 0. The only alternative that I can think of is a complicated weighting of the hierarchical tp, fp and fn counts to ensure that entities with a complex type hierarchy don't have a larger influence on the result compared to entities with an "easy" set of types.
Thanks @MichaelRoeder and @TortugaAttack. I can perfectly understand your concerns about the scoring I raised but it is more to be aligned with the known and popular neleval scorer.
Personally I think that the annotation:
<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
a nif:String , nif:RFC5147String , nif:Phrase ;
nif:anchorOf "Paris)" ;
nif:beginIndex "33"^^xsd:nonNegativeInteger ;
nif:endIndex "39"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
itsrdf:taClassRef dul:Place .
Must be count as "false positive" AND "false negative" (if the system do not propose nested entities) because the offset do not match, and then the type even if it is good one should not be taken as true positive but also as "false positive" AND "false negative" in the recognition step. This is how works neleval, and I'm ok with that because it seems logic to me.
Please, can you let me know once the fix will be pushed to the public instance of GERBIL? I will rerun my script for scoring and then compare between GERBIL and neleval.
Of cause, we will let you know. However, I think we still have a small misunderstanding.
Let's focus on the "Paris" / "Paris)" example. I totally agree that the recognition step has to count this as fp AND fn. I think there is no discussion regarding this point :wink: I want to underline, that the typing step is not able to see "Paris)" as attempt to match "Paris". It will handle them as two single entities and calculate for each of them precision, recall and F1-measure (for the reasons explained above). Therefore, it will count this 2 times with precision, recall, f1-score = 0 (not 1xfp and 1xfn) which leads to the overall evaluation scores of precision, recall, f1-score = 0.4375 which might be lower than expected.