probablepeople
probablepeople copied to clipboard
repeated label on valid name format: Surname, Givenname MiddleName
ORIGINAL STRING: Bianchette, Michael David PARSED TOKENS: [('Bianchette,', 'Surname'), ('Michael', 'GivenName'), ('David', 'Surname')] UNCERTAIN LABEL: Surname
When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly
To report an error in labeling a valid name, open an issue at https://github.com/datamade/probablepeople/issues/new - it'll help us continue to improve probablepeople!
Hi, all. Has there been any followup on this issue? I am seeing it as well. Out of a dataset of 320,000 names, probablepeople had trouble parsing about 19,000 of them, and 11,000 of those were because of this exact issue.
I tried following parserator's instructions for training the model with additional examples--used parserator's label utility to create 11 examples, which I then trained my model with. It says it wrote out an updated .crfsuite file, but I do not see an updated copy of this file anywhere, and the model's behavior has not changed. (The only .crfsuite files I see are the three that were installed with probablepeople, and they have retained their original last-modified timestamps.)