presidio icon indicating copy to clipboard operation
presidio copied to clipboard

all capital letters names are not detected

Open NuiMrme opened this issue 1 year ago • 1 comments

Describe the bug When the name is all capital letters, PERSON entity is not detected

To Reproduce

print(analyzer.analyze(entities = ["PERSON"],text='SOPHY SANTINO', language ="en"))
print(analyzer.analyze(entities = ["PERSON"],text='Sophy Santino', language ="en"))
print(analyzer.analyze(entities = ["PERSON"],text='sophy santino', language ="en"))

Expected behavior PERSON entity should be detected in all 3 prints

NuiMrme avatar Apr 10 '24 07:04 NuiMrme

Hi, this could be an issue with the default spaCy model, which has lower accuracy for all capital names. I would suggest to try a few approaches with the Presidio demo, and consider using a different model (e.g. from huggingface or flair)

omri374 avatar Apr 14 '24 12:04 omri374