pynlp icon indicating copy to clipboard operation
pynlp copied to clipboard

Does pynlp keep the original tag type "O" which is the non-entity part?

Open hexingren opened this issue 7 years ago • 3 comments

Hello,

Does pynlp keep the original tag type "O" which is the non-entity part?

For example, sentence = "Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife"

Expecting result: [('Nora Jani', 'PERSON'), ('a single person', 'O'), ('Matt Jani', 'PERSON'), ('and', 'O'), ('Susan Jani', 'PERSON'), ('husband and wife', 'O')]

Thanks.

hexingren avatar Apr 27 '18 14:04 hexingren

Yes, try this:

from pynlp import StanfordCoreNLP

nlp = StanfordCoreNLP(annotators='tokenize, ssplit, pos, ner')

document = nlp("Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife")

for sentence in document:
    for token in sentence:
        print(token, token.ner)

This will give you token level named entity recognition.

If you want entities that span multiple tokens, use entitymentions

nlp = StanfordCoreNLP(annotators='entitymentions')

for entity in document.entities:
    print(entity)

sina-al avatar Apr 27 '18 14:04 sina-al

I will try to write up some docs soon.

sina-al avatar Apr 27 '18 14:04 sina-al

For the first block of code, it will fall back to #12 if I add 'tokenize, ssplit, pos'. The working code for now is

from pynlp import StanfordCoreNLP

nlp = StanfordCoreNLP(annotators='ner', options = {"ner.useSUTime": False})
# The code below throws CoreNLPServerError: Status code: [500] 
# nlp = StanfordCoreNLP(annotators='tokenize, ssplit, pos, ner', options = {"ner.useSUTime": False})

document = nlp("Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife")

for sentence in document:
    for token in sentence:
        print(token, token.ner)

Should be a problem on the CoreNLP server side. Thanks!

hexingren avatar Apr 27 '18 15:04 hexingren