entity-fishing icon indicating copy to clipboard operation
entity-fishing copied to clipboard

Effect of the different possible parameter combinations for ‘mentions’ in the REST API

Open aa303554 opened this issue 3 years ago • 2 comments

The following observations come from the online API

1/When you reverse the order between ‘wikipedia’ and ‘ner’ in the mentions parameter, the result is different. Namely, when ‘ner’ comes second, NER isn’t performed at all. The documentation doesn’t cover this particular constraint.

For the order Wikipedia/ner:

image

Result with ner first and wikipedia second :

image

aa303554 avatar May 03 '21 12:05 aa303554

Thanks for reporting this @aa303554. This seems indeed a bug as the order of the processes on which mentions are extracted should not change the results. I need to look into it, for the time being, keep them in order ["ner", "wikipedia"].

lfoppiano avatar May 19 '21 01:05 lfoppiano

mmm it's not a bug, it depends on the order, and it's the expected result. Actually it has to consider the order.

The mentions field gives the list of "mention recognizers" to be applied successively. If a mention is already recognized by wikipedia, it is not "overwritten" by the NER mention. Similarly if the NER mention is found, the wikipedia one does not apply. In general we must start from the most specific mention recognizer, then finish by the most generic ones, Wikipedia.

This is probably easier to understand when using a specialized mention recognizer like a module to recognize the species name. It has to be applied first because it's the most specific (it already disambiguate the species name, so wikipedia is not as precise). However, the tool has no way to know in advance which one is the most specific, so the order is used. Does it make sense for you?

Ok we need to update the documentation to clarify that.

kermitt2 avatar May 19 '21 02:05 kermitt2