python-nameparser icon indicating copy to clipboard operation
python-nameparser copied to clipboard

Processing "de Meunier" doesn't recognized the prefix

Open akimotode opened this issue 4 years ago • 5 comments

I'm not sure if I am missing something, but if I run the parser on the string "de Mesnil", I am expecting it to give me either a first or a last name of "de Mesnil" (preferably the latter), given that "de" is a known prefix.

Instead I am getting a first name "de" and a last name "Mesnil".

That seems contradictory to the documentation for prefixes: Name pieces that appear before a last name. Prefixes join to the piece that follows them to make one new piece.

akimotode avatar Mar 17 '21 00:03 akimotode

I have not looked into the code to see for sure, but I believe the parser treats prefixes as a name piece instead of a prefix when there are only 2 total space-separated strings. This would be helpful for people who have first names that clash with prefixes, I'd have to look if there are any specific examples in the tests.

I'm curious your use case. Do you have examples in your data that occasionally include only last names and you want the parser to tell you that it is indeed a last name?

derek73 avatar Mar 17 '21 18:03 derek73

Thanks for the quick answer!

I believe my use is what you imply. In this specific case, the text includes three versions of the "name" in different places: "Sergeant de Mesnil", "Walter de Mesnil" and "de Mesnil".

After adding "sergeant" as a custom title I get three different parsings: "Sergeant de Mesnil" --> {title: "sergeant", last_name: "de Mesnil"} "Walter de Mesnil" ---> {first_name: "Walter", last_name: "de Mesnil"} "de Mesnil" ---> {first_name: "de", last_name: "Mesnil"}

On a side-note: It would be neat if there was an explicit LAST_NAME_TITLE option for titles. This would be handy for military titles like General, Colonel, Major, etc. as well as most nobility titles outside of King/Queen and Lord/Lady. I think it sort of works out-of-the-box, but I was surprised to not see it explicit.

akimotode avatar Mar 18 '21 21:03 akimotode

There is a set of titles that when followed by a single name assume that name is a first name. (It looks like it's not exposed in the documentation though.):

https://github.com/derek73/python-nameparser/blob/d498968e850577ffc4dfa01c27610500a9ef3a80/nameparser/config/titles.py#L4

All other titles are handled by the normal rest of the parser process, so assumed to be last names because there's more than one name part.

It currently includes King/queen but not Lady/Lord, maybe it should. Wikipedia page seems to make me think it could be either: https://en.wikipedia.org/wiki/Lady

derek73 avatar Mar 19 '21 17:03 derek73

FYI, I fixed the issue now by manually checking and fixing the output after parsing for the known prefix cases I have in my data.

if human_name.first in ['de', 'st', 'st.', 'van']: human_name.last = human_name.first + " " + human_name.last human_name.first = ""

I think the default behaviour could (should?) be similar to the above. if the original is , the output should be last = + " " + instead of first = & last =

Thanks for the pointer on the FIRST_NAME_TITLES. Using it now.

akimotode avatar Mar 23 '21 22:03 akimotode

I'm running into something similar with my name, Patrick van der Leer. in Dutch we call the "van der" part a tussenvoegsel. Even Patrick van Leer gives me "van Leer" as the surname/last_name and nothing for the middle name.


EDIT

https://github.com/derek73/python-nameparser/blob/8b73ff9e0aed23285f451cfa7091e47e9835a608/tests.py#L2069

This was not what I was expecting, "van der" would be part of the full surname/last name yes but I would set "van der" as a middle name or prefix of the surname

patvdleer avatar Jan 30 '22 20:01 patvdleer