probablepeople icon indicating copy to clipboard operation
probablepeople copied to clipboard

:family: a python library for parsing unstructured western names into name components.

Results 62 probablepeople issues
Sort by recently updated
recently updated
newest added

````Python >>> pp.tag('John Smith & Friends') (OrderedDict([('CorporationName', 'John Smith & Friends')]), 'Corporation') >>> pp.tag('John Smith and Friends') (OrderedDict([('GivenName', 'John'), ('Surname', 'Smith'), ('And', 'and'), ('SecondSurname', 'Friends')]), 'Household') ```` What do you...

Should tag() support this name? ```Python pp.tag('Bob Smith, John Jones, and Jacob Brown') ```` Exception ```` Traceback (most recent call last): File "", line 1, in File "c:\python27\lib\site-packages\probablepeople\__init__.py", line 132,...

I extracted [8964 entities from Wikidata related to the Wikipedia category Churches in the United States](https://github.com/az0/entity-metadata/blob/master/data/wikidata-church.csv), and then I manually filtered the list to 7000 records that really look like...

E.g.: ```python In [2]: probablepeople.tag('Solus Christi Brothers') RepeatedLabelError: ERROR: Unable to tag this string because more than one area of the string has the same label ORIGINAL STRING: Solus Christi...

bad parse

RepeatedLabelError is trying to construct an error message that concatenates several strings, and it fails when the name it is tagging has Unicode. ````Python import probablepeople as pp pp.tag(u'El Rey...

I have a large data set, which includes many Koreans with Korean names living in the United States. The names are purely written in Latin characters, but sometimes the parser...

Would you please document the the labels? I am labeling examples to submit as a pull request, and I would like to be consistent. **Issue 1** What exactly is the...

The name "Heidi Van Melt" is marked up [here](https://github.com/datamade/probablepeople/blob/560691e55dd995d30fd23662e7be0864fb49bc62/name_data/labeled/person_labeled.xml#L3018) as `Heidi Van Pelt` This appears to be incorrect. I would suggest: `Heidi Van Pelt` or `Heidi Van Pelt`

This page documents 19 labels https://probablepeople.readthedocs.io/en/latest/ However, it is missing these labels CorporationCommitteeType CorporationNameAndCompany CorporationNameBranchIdentifier CorporationNameBranchType OtherCorporationName SecondCorporationCommitteeType SecondCorporationName SecondCorporationNameAndCompany SecondCorporationNameBranchIdentifier SecondFirstInitial SecondGivenName SecondMiddleInitial SecondMiddleName SecondPrefixMarital SecondSuffixOther SecondSurname

documentation

ORIGINAL STRING: Randie John Mcdonald PARSED TOKENS: [('Randie', 'Surname'), ('John', 'GivenName'), ('Mcdonald', 'Surname')] UNCERTAIN LABEL: Surname perhaps this is purposeful considering some format which would make the full middle name...

bad parse