Duke icon indicating copy to clipboard operation
Duke copied to clipboard

Deduplication and record linkage

Open xinelim opened this issue 6 years ago • 1 comments

Hi, Given two sets of datasets, is it possible that I deduplicate each dataset and then perform record linkage across two datasets? Please advise.

xinelim avatar Oct 31 '18 23:10 xinelim

Hi !

You will need to do this step by step: deduplicate each dataset individually (in a new file for example) and then link them. There is no way of doing those at the same time.

I sort of wanted to do something like you at one point using Python by getting the matches/links from the console with the command java no.priv.garshol.Duke .... config.xml. It was a waste of time, you should go directly with Java and use the MatchListener classes and maybe make your own if you need to.

uderline avatar Nov 01 '18 09:11 uderline