python-nameparser
python-nameparser copied to clipboard
nickname + last name
Names such as
"Rick" Edmonds
are parsed in such a way that "Edmonds" is treated as the first name rather than the last name.
So I think this is what you want?
$ python tests.py '"Rick" Edmonds'
<HumanName : [
title: ''
first: ''
middle: ''
last: 'Edmonds'
suffix: ''
nickname: 'Rick'
]>
What do you think about the first name field remaining blank in this case?
Right now the nickname handling happens as a preprocess without any awareness of where the nickname appears in the string. I had planned to refactor the nickname handling a bit in order to better support maiden names (#22) that happen after a last name.
Right now I could fix it in this single case where there is no titles or other name pieces, but seems like it should also support things like 'Senator "Rich" Edmonds' too, and that will need to wait for a bit bigger refactor to move that code into the parse logic and take position in the string into account.
Right, my expectation would be that the first name remains blank. A narrow fix for names in the form of a nickname followed by a last name (and nothing else) would be sufficient for my needs at the moment!
I just released v1.0.2 which has a narrow fix for that one example. I'm going to leave this open and close it when I update the nickname handling logic.
Sounds great, thanks! Just adding a follow-up issue for the coming update: "Rick" Edmonds now parses perfectly, but "Rick" Edmonds Jr. (or "Rick" Edmonds III, etc.) flies off the rails a bit:
In [3]: HumanName('"Rick" Edmonds Jr.')
Out[3]:
<HumanName : [
title: ''
first: 'Edmonds'
middle: ''
last: 'Jr.'
suffix: ''
nickname: 'Rick'
]>
In [4]: HumanName('"Rick" Edmonds, Jr.')
Out[4]:
<HumanName : [
title: 'Jr.'
first: ''
middle: ''
last: 'Edmonds'
suffix: ''
nickname: 'Rick'
]>```
Yea, that makes sense. I didn't make any changes to the codepaths that handle the comma formats, I'm guessing that behavior is unchanged from the previous version. I literally hardcoded it to only handle 2 name parts, which is why it fails when you add "jr" to the end.
Trying to think a bit about the final behavior we want. Is it true that nicknames only happen after first names? I feel like these are the cases I know about that we want to handle in some way:
- Nickname - Robert "Bob" Jones
- [title] "Bob" [middle] Jones [suffix], [suffix]
- Jones [suffix], [title] "Bob" [middle]
- Maiden Name - Roberta Jones (Smith)
- [title] Roberta [middle] Jones (Smith), [suffix]
- Jones (Smith) [suffix], [title] Roberta [middle], [suffix]
- Junk - John Jones (Google Docs), Jr. (Unknown)
When I first implemented this it was just to handle the junk, so I haven't thought too much about the other cases. This is helping me understand how a nickname is different.
I think when there is a nickname at the beginning of the string, i.e. may have a title but missing a first name, we basically want it to behave as if the first name slot has been filled by the nickname and then let the rest of the parse happen as normal. I don't think a maiden name will never appear without a last name, so it won't need that kind of handling. And the junk probably doesn't matter where it ends up as long as it's not filling up a name slot.
Let me know if you think that sounds right or you have any examples of things in quotes or parenthesis that are not one of those 3 types of things.
Here is where I was encountering the names in the form of [nickname last]. From comparing the list there, it appears they all conform to one of the three.