aclpub2 icon indicating copy to clipboard operation
aclpub2 copied to clipboard

author last name "de Lhoneux" incorrectly uppercased to "De Lhoneux"?

Open nschneid opened this issue 2 years ago • 6 comments

As reported by @mdelhoneux in acl-org/acl-anthology#3208

I wonder if https://github.com/rycolab/aclpub2/blob/47dc3d2b896aa359e984d8e6e37ac57a8bd80acd/openreview/util.py#L62-L63 might be the culprit.

nschneid avatar Apr 19 '24 03:04 nschneid

It looks like the heuristic implemented there is that each word of the name that is all-lowercase or all-uppercase is converted to initial capitalization.

It might be better to tweak the capitalization of the words of the name only if none of the words of the name distinguish uppercase and lowercase, i.e.:

if len(last_name)>2:
    if all(n.isupper() or n.islower() for n in last_name.split(" ")):   # name does not contain any words with both uppercase and lowercase characters; impose initial-only capitalization for each word
        last_name = " ".join([n[0].upper() + n[1:].lower() if (n==n.upper() or n==n.lower()) else n for n in last_name.split(" ")]) 

UPDATE: realized the inline if condition is redundant

if len(last_name)>2:
    if all(n.isupper() or n.islower() for n in last_name.split(" ")):   # name does not contain any words with both uppercase and lowercase characters; impose initial-only capitalization for each word
        last_name = " ".join([n[0].upper() + n[1:].lower() for n in last_name.split(" ")]) 

nschneid avatar Apr 19 '24 04:04 nschneid

I would love to see a list of names as exported from Open Review alongside the output of this function. We should really have a unit or regression test for this function since it is very important and getting it wrong causes a lot of corrections and headaches downstream.

mjpost avatar Jun 11 '24 14:06 mjpost

Hi @mjpost

I will do so when I download the list of authors from the next EMNLP. I will apply the update suggested by @nschneid so we can see the difference and maybe "fine-tune" it.

crux82 avatar Jun 11 '24 20:06 crux82

There was a similar issue for hyphenated names, which was addressed in #159 maybe we can take a similar approach here? Basically, it uses what people provide as ground truth and always capitalizes the first and last. Middle is only capitalized if the person capitalized it in their softconf profile (so would need to be addressed for open review.

Potentially worth having shared functionality in a file of its own, so that we avoid duplicates :)

zeeraktalat avatar Aug 23 '24 15:08 zeeraktalat

This is still causing issues: https://github.com/acl-org/acl-anthology/issues/5560 https://github.com/acl-org/acl-anthology/issues/5561

mbollmann avatar Jul 28 '25 15:07 mbollmann

Ideally we fix this here, but we should also do better capitalization on the Anthology side. An easy suggestion is a case insensitive lookup against our names database, which returns the correct capitalization.

mjpost avatar Jul 28 '25 15:07 mjpost