Duke
Duke copied to clipboard
Deduplication and record linkage
Hi, Given two sets of datasets, is it possible that I deduplicate each dataset and then perform record linkage across two datasets? Please advise.
Hi !
You will need to do this step by step: deduplicate each dataset individually (in a new file for example) and then link them. There is no way of doing those at the same time.
I sort of wanted to do something like you at one point using Python by getting the matches/links from the console with the command java no.priv.garshol.Duke .... config.xml
. It was a waste of time, you should go directly with Java and use the MatchListener classes and maybe make your own if you need to.