crfsuite icon indicating copy to clipboard operation
crfsuite copied to clipboard

Exclude sentence with only O

Open ericcanadas opened this issue 6 years ago • 1 comments

Hi,

More a question than an issue. Is it useful to leave sentences that contain only O in the training set ? Exemple : (here, the sentence, "The dog is brown")

EU B-ORG
rejects O
German B-MISC
call O
to O
boycott O
British B-MISC
lamb O
. O

The O
dog O
is O
brown O

Peter B-PER
Blackburn I-PER

ericcanadas avatar Jun 27 '18 14:06 ericcanadas

@EricC91 Yes, it is beneficial to keep the sentence with only "O" labels. Those are so-called negative examples. Having the select negatives in your training set makes your model much more robust to false positives (tagging where the model should not tag).

From my experience building many custom NER models, it is beneficial to add negative examples in small batches. The ones you add in the current iteration are the ones where the model tags. After a couple of iterations on some random examples, your model will learn pretty quickly. The added examples must be diverse.

usptact avatar Jun 27 '18 22:06 usptact