zingg icon indicating copy to clipboard operation
zingg copied to clipboard

Match likelihood model based on prevalence of name or term in a field

Open ibastian opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. If you have an entity represented as first name & last name but are limited in other identifying fields, are we able to arrive at a likelihood of a match based on how common each of first name and last name are when put together?

Describe the solution you'd like A model that represents likelihood of a match based on the occurrences of a name either within the file where it is contained in or a dataset that is created.

Describe alternatives you've considered Excel based scoring of number of occurrences of a name within a column. One issue is that an overrepresentation of one person can falsely prevalence of that name in general

Additional context

ibastian avatar Jul 21 '22 13:07 ibastian

Thanks for reporting @ibastian.

linking to open issue for implementing #74

sonalgoyal avatar Jul 23 '22 00:07 sonalgoyal