gerbil
gerbil copied to clipboard
Reproducibility of WAT results
Dear authors,
First of all, thank you for the great work you do in making entity linking results more comparable.
My question is specifically about GERBIL's WAT annotator: I get different results when selecting WAT as an annotator in the A2KB task versus when I use my own NIF API which simply forwards requests from GERBIL to the official WAT API.
My setup is as follows:
I built my own NIF API which forwards the text GERBIL posts to the WAT API at https://wat.d4science.org/wat/tag/tag.
I do not provide any additional parameters to the WAT API.
I take the result from the WAT API, extract the span start and end from the fields start and end and the entity title from the title field.
I create an entity URI as follows (in Python):
from urllib.parse import quote
entity_uri = "http://dbpedia.org/resource/" + quote(wiki_title.replace(" ", "_"))
Then I send the span and the entity URI back to GERBIL.
The results I get using this approach differ from those I get when simply selecting WAT as annotator in GERBIL. On KORE50 for example, I get a Micro InKB F1 score of 0.5512 using my NIF API and 0.5781 when selecting WAT as annotator. See this experiment: http://gerbil.aksw.org/gerbil/experiment?id=202409170001
I was wondering if GERBIL sets any additional parameters in the call to the API or filters the returned entities by score using a threshold. Looking at the GERBIL code, I didn't see any of that though. Can you confirm that GERBIL does not use additional API parameters and does not filter results by score? This would already help me to narrow down the problem.
I just realized that the results for the recognition task are the same, so the problem might be in the URI matching. How exactly does GERBIL create URIs from the Wikipedia titles predicted by WAT?
Any other hints to where this discrepancy could come from are highly appreciated.
Many thanks in advance!