python-nameparser
python-nameparser copied to clipboard
Initials Formatting
I wanted to remove any extraneous characters from the initials and only have the initials with no punctuation or whitespace. In the process I stumbled upon two shortcomings with the formatting of initials.
Setting initials delimiter to empty string
>>> from nameparser import HumanName
>>> HumanName('Doe, John A.').initials()
'J. A. D.'
> >> HumanName('Doe, John A.', initials_delimiter='').initials()
'J. A. D.' <= EXPECTED 'J A D'
>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.initials_delimiter = ''
>>> HumanName('Doe, John A.').initials()
'J A D'
>>> HumanName('Doe, John A.', initials_format='{first}{middle}{last}').initials()
'JAD'
It seems that while one can set the inititals_delimiter to an empty string via the CONSTANT, it is not possible via the keyword on HumanName. Presumably, this is because an empty string evaluates to False here:
https://github.com/derek73/python-nameparser/blob/759a1316f2fda4395714f36d777fd014dcdd51b0/nameparser/parser.py#L99
I would expect this could be fixed by changing that line to:
self.initials_delimiter = initials_delimiter if initials_delimiter is not None else self.C.initials_delimiter
Removing all whitespace from initials is not possible with multi-part names.
>>> from nameparser import HumanName
>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.initials_delimiter = ''
>>> HumanName('Doe, John A. Kenneth', initials_format='{first}{middle}{last}').initials()
'JA KD' <= EXPECTED 'JAKD'
>>> HumanName('Doe, John A. Kenneth', initials_delimiter='.', initials_format='{first}{middle}{last}').initials()
'J.A. K.D.' <= EXPECTED 'JAKD'
This one is not so easy to fix. The code joins the parts together with a space hard-coded in.
https://github.com/derek73/python-nameparser/blob/759a1316f2fda4395714f36d777fd014dcdd51b0/nameparser/parser.py#L270-L277
You could require the space to be part of the delimiter, but that might result in weird output for certain formats (i.e., {last}, {first} {middle}) and it would be a backward incompatible change for anyone who has already defined custom delimiters. Maybe another setting needs to be defined for this. Although, I have no idea what name to give it.
In the end, I worked around both issues with ''.join(name.initials_list()), but it would be nice to be able to have full control with the provided formatting options.