dkpro-core icon indicating copy to clipboard operation
dkpro-core copied to clipboard

Some NER tools do not mark multi-word NEs

Open reckart opened this issue 7 years ago • 0 comments

Some NER tools such as the CoreNlpNamedEntityRecognizer mark every token individually as a NE instead of creating a multi-token NE.

2018-05-23_11-43-44

IMHO the default behavior should be that NEs with the same label are joined unless the model uses a BIO-like encoding in which case the BIO markers should be respected.

Also the unit tests for the NER tools should be changed to include a multi-word NE, e.g. change John from the current unit tests into John Smith.

  • [ ] CoGrOO Named Entity Recognizer
  • [ ] CoreNLP Named Entity Recogizer (old API)
  • [ ] CoreNLP Named Entity Recognizer
  • [ ] Illinois CCG Named Entity Recognizer
  • [ ] LingPipe Named Entity Recognizer
  • [ ] NLP4J Named Entity Recognizer
  • [ ] OpenNLP Named Entity Recognizer

reckart avatar May 23 '18 09:05 reckart