nlprankings
nlprankings copied to clipboard
Deduplication needed
Interesting project, but you'all need some way to merge author entities. I'm at the U of Arizona and see myself listed twice: once as 'Tom Hicks' and once as 'Thomas Hicks'. My co-author 'Marco A. Valenzuela-Escárcega' has this issue also, and that's just at our institution, so I suspect there are several others.
@hickst thanks for pointing this out. We are hoping for the community to help us fixing these issues since it is hard to detect them automatically. We will look into this and let you know soon.
@jdchoi77 Is there a way we can fix this by ourselves or do you have to make the changes? If the latter, is there a procedure to ask for a change (e.g. file a GitHub issue, fill out a form, make a pull request)?
@hickst Thank you for reflecting this to us. After some digging, I believe this issue is the result of having two separate author IDs on ACL Anthology. For instance, you have two separate author pages for Tom Hicks and Thomas Hicks. For your co-author Marco A. Valenzuela-Escárcega, his pages are Marco A. Valenzuela-Escárcega and Marco Valenzuela-Escárcega.
To fix this, please make a pull request and update the fields authors
and author_id
in the respective JSON file located under the directory dat/acl_anthology/json/
.
Your author_id is essentially the last part of the url of your author's page. tom-hicks
for Tom Hicks and thomas-hicks
for Thomas Hicks. Your publication under the name Thomas Hicks is P15-4022
. If you would like to go with the name Tom Hicks instead of Thomas Hicks, go to the respective venue JSON file, which is P15-4.json
and update your name to Hicks, Thomas and author_id to tom-hicks for the publication P15-4022.
Your co-author 'Marco A. Valenzuela-Escárcega' can follow this approach to change his duplication problem as well!
Thanks again for your feedback.