dblp
dblp copied to clipboard
Perform author name disambiguation to produce new mapping
From the data, it appears the AMiner group did not perform any name disambiguation. This has led to a dataset with quite a few duplicate author records. This package currently does not address these issues.
The most obvious examples are those where the first or second name is abbreviated with a single letter in one place and spelled out fully in another. Use of dots and/or hyphens in some places also leads to different entity mappings. Another case that is quite common is when hyphenated names are spelled in some places with the hyphen and in some without.
There are also simple common misspellings, although these are harder to detect, since an edit distance of 1 or 2 could just as easily be a completely different name. One case which might be differentiated is when the edit is a deletion of a letter in a string of one or more of that same letter. For instance, "Acharya" vs. "Acharyya". Here it likely the second spelling simply has an extraneous y.
@macks22 Any leads for this author name disambiguation? This indeed leads to duplicates resulting in inaccurate results for various tasks.
You might try using the Python nameparser package. I've had good luck with that in the past. Please submit a pull request if you manage to make some progress on that.