dkpro-core
dkpro-core copied to clipboard
Some NER tools do not mark multi-word NEs
Some NER tools such as the CoreNlpNamedEntityRecognizer mark every token individually as a NE instead of creating a multi-token NE.

IMHO the default behavior should be that NEs with the same label are joined unless the model uses a BIO-like encoding in which case the BIO markers should be respected.
Also the unit tests for the NER tools should be changed to include a multi-word NE, e.g. change John from the current unit tests into John Smith.
- [ ] CoGrOO Named Entity Recognizer
- [ ] CoreNLP Named Entity Recogizer (old API)
- [ ] CoreNLP Named Entity Recognizer
- [ ] Illinois CCG Named Entity Recognizer
- [ ] LingPipe Named Entity Recognizer
- [ ] NLP4J Named Entity Recognizer
- [ ] OpenNLP Named Entity Recognizer