recordlinkage icon indicating copy to clipboard operation
recordlinkage copied to clipboard

Handle set-wise comparison and pooling

Open jnothman opened this issue 8 years ago • 0 comments

If one or more of my datasets includes a list of possible values (e.g. alternative names in the authoritative record for some entity), I may want to do comparisons between all values of the corresponding fields in each dataset, and then pool over them (max, avg) to get an overall score for that pair of records. It might be a set of numerics, a set of strings, a set of addresses...

While I can either handle this with a custom comparison function, which in turn might contain a Compare; or by duplicating rows in my input for every combination of set values, and constructing appropriate candidate links, none of this seems straightforward for something that would seem to be a common need in record linkage.

Could we get, at least, a recipe for this, or a helper?

jnothman avatar Nov 08 '17 05:11 jnothman