seqeval icon indicating copy to clipboard operation
seqeval copied to clipboard

Non-NER tags are missing one letter

Open versae opened this issue 4 years ago • 7 comments

How to reproduce the behaviour

If I execute the next code with POS tags

y_pred = [['ADJ', 'CONJ', 'VERB', 'AUX', 'NOUN', 'ADJ', 'SCONJ'], ['CONJ', 'SCONJ', 'X']]
y_true = [['ADJ', 'DET', 'VERB', 'AUX', 'NOUN', 'ADJ', 'SCONJ'], ['CONJ', 'ART', 'X']]
print(classification_report(y_true, y_pred))

What I get is:

              precision    recall  f1-score   support

        CONJ       0.50      1.00      0.67         1
          DJ       1.00      1.00      1.00         2
         ERB       1.00      1.00      1.00         1
          ET       0.00      0.00      0.00         1
         ONJ       0.50      1.00      0.67         1
         OUN       1.00      1.00      1.00         1
          RT       0.00      0.00      0.00         1
          UX       1.00      1.00      1.00         1

   micro avg       0.78      0.78      0.78         9
   macro avg       0.62      0.75      0.67         9
weighted avg       0.67      0.78      0.70         9

Here, all tags are missing the first letter. If I pass in suffix=True, now the missing letter of the tags is the last one:

              precision    recall  f1-score   support

          AD       1.00      1.00      1.00         2
          AR       0.00      0.00      0.00         1
          AU       1.00      1.00      1.00         1
         CON       0.50      1.00      0.67         1
          DE       0.00      0.00      0.00         1
         NOU       1.00      1.00      1.00         1
        SCON       0.50      1.00      0.67         1
         VER       1.00      1.00      1.00         1

   micro avg       0.78      0.78      0.78         9
   macro avg       0.62      0.75      0.67         9
weighted avg       0.67      0.78      0.70         9

Moreover, one letter tags are ignored.

Your Environment

  • Operating System: Ubuntu 20.10
  • Python Version: Python 3.8.6
  • Package Version: seqeval==1.2.2

versae avatar Apr 15 '21 11:04 versae

Can confirm I have the same issue.

IssamAssafi avatar Apr 27 '21 09:04 IssamAssafi

Same issue here. It does not work with POS tags.

mirfan899 avatar May 12 '21 12:05 mirfan899

I have the same issue here!

DuyguA avatar Jun 17 '21 12:06 DuyguA

same issue!!

liaeh avatar Jun 24 '21 09:06 liaeh

This problem only occurs if you are missing the IOB-style tags, e.g. ENTITY instead of B-ENTITY, I-ENTITY... I think it is caused by line 189, which removes the first character of the tag name because it assumes it to have a prefix.

liaeh avatar Aug 16 '21 14:08 liaeh

Thanks for finding the key line, @liaeh! As I see it then, we only have two options:

  1. We re-label our datasets if not IOB-style to start each label with B.
  2. We add an option to the library to not remove the first character if not IOB-style.

versae avatar Aug 30 '21 13:08 versae

Thanks for finding the key line, @liaeh! As I see it then, we only have two options:

1. We re-label our datasets if not IOB-style to start each label with `B`.

2. We add an option to the library to not remove the first character if not IOB-style.

Option 2 would make most sense! I've been using option 1 as a workaround though :)

liaeh avatar Sep 02 '21 07:09 liaeh