seqeval
seqeval copied to clipboard
Non-NER tags are missing one letter
How to reproduce the behaviour
If I execute the next code with POS tags
y_pred = [['ADJ', 'CONJ', 'VERB', 'AUX', 'NOUN', 'ADJ', 'SCONJ'], ['CONJ', 'SCONJ', 'X']]
y_true = [['ADJ', 'DET', 'VERB', 'AUX', 'NOUN', 'ADJ', 'SCONJ'], ['CONJ', 'ART', 'X']]
print(classification_report(y_true, y_pred))
What I get is:
precision recall f1-score support
CONJ 0.50 1.00 0.67 1
DJ 1.00 1.00 1.00 2
ERB 1.00 1.00 1.00 1
ET 0.00 0.00 0.00 1
ONJ 0.50 1.00 0.67 1
OUN 1.00 1.00 1.00 1
RT 0.00 0.00 0.00 1
UX 1.00 1.00 1.00 1
micro avg 0.78 0.78 0.78 9
macro avg 0.62 0.75 0.67 9
weighted avg 0.67 0.78 0.70 9
Here, all tags are missing the first letter. If I pass in suffix=True, now the missing letter of the tags is the last one:
precision recall f1-score support
AD 1.00 1.00 1.00 2
AR 0.00 0.00 0.00 1
AU 1.00 1.00 1.00 1
CON 0.50 1.00 0.67 1
DE 0.00 0.00 0.00 1
NOU 1.00 1.00 1.00 1
SCON 0.50 1.00 0.67 1
VER 1.00 1.00 1.00 1
micro avg 0.78 0.78 0.78 9
macro avg 0.62 0.75 0.67 9
weighted avg 0.67 0.78 0.70 9
Moreover, one letter tags are ignored.
Your Environment
- Operating System:
Ubuntu 20.10 - Python Version:
Python 3.8.6 - Package Version:
seqeval==1.2.2
Can confirm I have the same issue.
Same issue here. It does not work with POS tags.
I have the same issue here!
same issue!!
This problem only occurs if you are missing the IOB-style tags, e.g. ENTITY instead of B-ENTITY, I-ENTITY... I think it is caused by line 189, which removes the first character of the tag name because it assumes it to have a prefix.
Thanks for finding the key line, @liaeh! As I see it then, we only have two options:
- We re-label our datasets if not IOB-style to start each label with
B. - We add an option to the library to not remove the first character if not IOB-style.
Thanks for finding the key line, @liaeh! As I see it then, we only have two options:
1. We re-label our datasets if not IOB-style to start each label with `B`. 2. We add an option to the library to not remove the first character if not IOB-style.
Option 2 would make most sense! I've been using option 1 as a workaround though :)