anonlink icon indicating copy to clipboard operation
anonlink copied to clipboard

Support multiple bloom filters

Open hardbyte opened this issue 8 years ago • 0 comments

It may make sense to calculate multiple CLKs using different field sets for improved matching, blocking, matching with orgs who only have a subset of the fields, and most importantly for automated methods to determine the threshold.

Example

If the possible fields are Name, Phone, Address, Email we might calculate CLKs for:

  • Name, Phone, Address, Email
  • Name, Phone, Address
  • Name, Address, Email
  • Phone, Address, Email
  • Name, Phone
  • Name, Address
  • Name, Email
  • Phone, Address
  • Phone, Email
  • etc

Need to investigate further and determine what changes would be required in anonlink to support matching datasets which contain sets of different clks.

Aha! Link: https://csiro.aha.io/features/ANONLINK-78

hardbyte avatar May 30 '17 01:05 hardbyte