inception-external-recommender
inception-external-recommender copied to clipboard
SklearnMentionDetector error in BIO encoding
I think the line below
https://github.com/inception-project/inception-external-recommender/blob/41d894c05053720d2b37510568d11d36433c3cf9/ariadne/contrib/sklearn.py#L92
should be
if token.begin >= annotation.begin and token.end <= annotation.end:
Also I think the state machine is wrong, if there are more than 2 tokens for a single annotation, the results is BIBI rather than BIII. The code will only generate an I-MENTION if the preceding token is B-MENTION. But what it should do is generate I-MENTION if the previous token is B-MENTION or I-MENTION and we're still in the same annotation.
I replaced lines 88-103 with the following - I'm not 100% sure its correct / robust though
for token in tokens:
tag = "O"
for annotation in annotations:
if token.begin >= annotation.begin and token.end <= annotation.end:
if token.begin == annotation.begin:
tag = "B-MENTION"
elif token.end <= annotation.end:
tag = "I-MENTION"
break
I will have a look. I never really used this recommender so there certainly might be bugs in there.