anonlink
anonlink copied to clipboard
Support multiple bloom filters
It may make sense to calculate multiple CLKs using different field sets for improved matching, blocking, matching with orgs who only have a subset of the fields, and most importantly for automated methods to determine the threshold.
Example
If the possible fields are Name, Phone, Address, Email we might calculate CLKs for:
- Name, Phone, Address, Email
- Name, Phone, Address
- Name, Address, Email
- Phone, Address, Email
- Name, Phone
- Name, Address
- Name, Email
- Phone, Address
- Phone, Email
- etc
Need to investigate further and determine what changes would be required in anonlink to support matching datasets which contain sets of different clks.
Aha! Link: https://csiro.aha.io/features/ANONLINK-78