zingg
zingg copied to clipboard
Match likelihood model based on prevalence of name or term in a field
Is your feature request related to a problem? Please describe. If you have an entity represented as first name & last name but are limited in other identifying fields, are we able to arrive at a likelihood of a match based on how common each of first name and last name are when put together?
Describe the solution you'd like A model that represents likelihood of a match based on the occurrences of a name either within the file where it is contained in or a dataset that is created.
Describe alternatives you've considered Excel based scoring of number of occurrences of a name within a column. One issue is that an overrepresentation of one person can falsely prevalence of that name in general
Additional context
Thanks for reporting @ibastian.
linking to open issue for implementing #74