litstudy
litstudy copied to clipboard
Improve fuzzy matching when calulating statistics
Calculating the statistics requires fuzzy matching of names. Currently, this matching is not too aggressive since we do want to avoid incorrectly matching two different names. The matching algorithm should be improved, possibly by adding additional parameters are asking the user if two names are equal?
Fuzzy matching appears in three places:
- Affiliation names (e.g., "University of Amsterdam" == "the University of Amsterdam")
- Author names (e.g., "John Doe" == "John. M. Doe"?)
- Venue/conference/journals names (e.g., "Journal on Parallel Computing" == "J. Parallel Computing")