dedupe
dedupe copied to clipboard
partially supervised classification
PU Learning looks like it might be a great fit for record-linkage problems? https://www.cs.uic.edu/~liub/NSF/PSC-IIS-0307239.html
splink seems to somehow guess which records are matches using some sort of unsupervised Expectation Maximisation algorithm (that it does not explain very well and I couldn't find a good explanation for anywhere...)