dblp icon indicating copy to clipboard operation
dblp copied to clipboard

Perform author name disambiguation to produce new mapping

Open macks22 opened this issue 10 years ago • 2 comments

From the data, it appears the AMiner group did not perform any name disambiguation. This has led to a dataset with quite a few duplicate author records. This package currently does not address these issues.

The most obvious examples are those where the first or second name is abbreviated with a single letter in one place and spelled out fully in another. Use of dots and/or hyphens in some places also leads to different entity mappings. Another case that is quite common is when hyphenated names are spelled in some places with the hyphen and in some without.

There are also simple common misspellings, although these are harder to detect, since an edit distance of 1 or 2 could just as easily be a completely different name. One case which might be differentiated is when the edit is a deletion of a letter in a string of one or more of that same letter. For instance, "Acharya" vs. "Acharyya". Here it likely the second spelling simply has an extraneous y.

macks22 avatar Feb 11 '15 23:02 macks22

@macks22 Any leads for this author name disambiguation? This indeed leads to duplicates resulting in inaccurate results for various tasks.

tejasshah93 avatar Sep 07 '15 20:09 tejasshah93

You might try using the Python nameparser package. I've had good luck with that in the past. Please submit a pull request if you manage to make some progress on that.

macks22 avatar Sep 07 '15 23:09 macks22