anonlink-entity-service
anonlink-entity-service copied to clipboard
Document assumptions about the input error characteristics
As each PII field may have different types of errors (e.g. missing data, transcription error, entirely changed data...) we need to document any built in assumptions.
- Matching common names: "Wu", "Smith" etc
- Switching fields: "Dexter Cody" vs "Cody Dexter"
- Edits which preserve letter frequency but not bigrams: "Gerg" vs "Greg"
- Address changes: "123 blah lane" vs "84 another street"
- Format issues: "14/8/2018" vs "8/14/2018"
I've put the issue here but it might make more sense in clkhash.
cc: @wilko77 @nbgl
Aha! Link: https://csiro.aha.io/features/ANONLINK-13