inception-external-recommender icon indicating copy to clipboard operation
inception-external-recommender copied to clipboard

SklearnMentionDetector error in BIO encoding

Open david-waterworth opened this issue 3 years ago • 2 comments

I think the line below

https://github.com/inception-project/inception-external-recommender/blob/41d894c05053720d2b37510568d11d36433c3cf9/ariadne/contrib/sklearn.py#L92

should be

if token.begin >= annotation.begin and token.end <= annotation.end:

david-waterworth avatar Apr 21 '21 05:04 david-waterworth

Also I think the state machine is wrong, if there are more than 2 tokens for a single annotation, the results is BIBI rather than BIII. The code will only generate an I-MENTION if the preceding token is B-MENTION. But what it should do is generate I-MENTION if the previous token is B-MENTION or I-MENTION and we're still in the same annotation.

I replaced lines 88-103 with the following - I'm not 100% sure its correct / robust though

for token in tokens:
    tag = "O"
    for annotation in annotations:
        if token.begin >= annotation.begin and token.end <= annotation.end:
            if token.begin == annotation.begin:
                tag = "B-MENTION"
            elif token.end <= annotation.end:
                tag = "I-MENTION"
            break

david-waterworth avatar Apr 21 '21 06:04 david-waterworth

I will have a look. I never really used this recommender so there certainly might be bugs in there.

jcklie avatar Apr 21 '21 07:04 jcklie