stanza
stanza copied to clipboard
Poor NER performance?
I am using Stanza to identify NER is short pieces of text like (business names/brand names). Here is one example:
# BUILDING THE MODELS
#-----stanza
sen = stanza.Pipeline ("en")
smlp = stanza.MultilingualPipeline()
# TESTING THE MODELS
name = 'The Port of Peri Peri'
print('stanza sen')
doc = sen(name)
for sent in doc.sentences:
for token in sent.tokens:
for word in token.words:
print('-----------------')
print('stanza smlp')
doc = smlp(name)
for sent in doc.sentences:
for token in sent.tokens:
for word in token.words:
print(word.text, word.xpos, word.upos, word.deprel, token.ner)#, word.feats)
print('-----------------')
stanza sen
The DT DET det B-PERSON
Port NNP PROPN root I-PERSON
of IN ADP case I-PERSON
Peri NNP PROPN nmod I-PERSON
Peri NNP PROPN nmod E-PERSON
-----------------
stanza smlp
The DT DET det B-PERSON
Port NNP PROPN root I-PERSON
of IN ADP case I-PERSON
Peri NNP PROPN nmod I-PERSON
Peri NNP PROPN nmod E-PERSON
-----------------
Obviously, 'The Port of Peri Peri' does not even look like a person's name. Is there any way I can improve Stanza's performance?