presidio
presidio copied to clipboard
all capital letters names are not detected
Describe the bug When the name is all capital letters, PERSON entity is not detected
To Reproduce
print(analyzer.analyze(entities = ["PERSON"],text='SOPHY SANTINO', language ="en"))
print(analyzer.analyze(entities = ["PERSON"],text='Sophy Santino', language ="en"))
print(analyzer.analyze(entities = ["PERSON"],text='sophy santino', language ="en"))
Expected behavior PERSON entity should be detected in all 3 prints
Hi, this could be an issue with the default spaCy model, which has lower accuracy for all capital names. I would suggest to try a few approaches with the Presidio demo, and consider using a different model (e.g. from huggingface or flair)