stanza icon indicating copy to clipboard operation
stanza copied to clipboard

Poor NER performance?

Open stajdini opened this issue 5 months ago • 1 comments

I am using Stanza to identify NER is short pieces of text like (business names/brand names). Here is one example:


 # BUILDING THE MODELS
#-----stanza
sen = stanza.Pipeline ("en")
smlp = stanza.MultilingualPipeline()

# TESTING THE MODELS
name = 'The Port of Peri Peri'

print('stanza sen')
doc = sen(name)
for sent in doc.sentences:
   for token in sent.tokens:
       for word in token.words:
print('-----------------')

print('stanza smlp')
doc = smlp(name)
for sent in doc.sentences:
   for token in sent.tokens:
       for word in token.words:
           print(word.text, word.xpos, word.upos, word.deprel, token.ner)#, word.feats)
print('-----------------')


stanza sen
The DT DET det B-PERSON
Port NNP PROPN root I-PERSON
of IN ADP case I-PERSON
Peri NNP PROPN nmod I-PERSON
Peri NNP PROPN nmod E-PERSON
-----------------
stanza smlp
The DT DET det B-PERSON
Port NNP PROPN root I-PERSON
of IN ADP case I-PERSON
Peri NNP PROPN nmod I-PERSON
Peri NNP PROPN nmod E-PERSON
-----------------

Obviously, 'The Port of Peri Peri' does not even look like a person's name. Is there any way I can improve Stanza's performance?

stajdini avatar Sep 24 '24 16:09 stajdini