probablepeople icon indicating copy to clipboard operation
probablepeople copied to clipboard

repeated label on valid name format: Surname, Givenname MiddleName

Open matthoskins1980 opened this issue 5 years ago • 1 comments

ORIGINAL STRING: Bianchette, Michael David PARSED TOKENS: [('Bianchette,', 'Surname'), ('Michael', 'GivenName'), ('David', 'Surname')] UNCERTAIN LABEL: Surname

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly

To report an error in labeling a valid name, open an issue at https://github.com/datamade/probablepeople/issues/new - it'll help us continue to improve probablepeople!

matthoskins1980 avatar Oct 05 '19 20:10 matthoskins1980

Hi, all. Has there been any followup on this issue? I am seeing it as well. Out of a dataset of 320,000 names, probablepeople had trouble parsing about 19,000 of them, and 11,000 of those were because of this exact issue.

I tried following parserator's instructions for training the model with additional examples--used parserator's label utility to create 11 examples, which I then trained my model with. It says it wrote out an updated .crfsuite file, but I do not see an updated copy of this file anywhere, and the model's behavior has not changed. (The only .crfsuite files I see are the three that were installed with probablepeople, and they have retained their original last-modified timestamps.)

jbrezovan avatar Nov 19 '21 20:11 jbrezovan