recordlinkage
recordlinkage copied to clipboard
Multiple Core Issues
Specifying the number of cores (n_jobs) appears to make the algorithm run slower.
dupe_indexer = rl.Index() dupe_indexer.block(['first_name_clean','last_name_clean']) dupe_candidate_links = dupe_indexer.index(df) compare_dupes = rl.Compare(n_jobs=12). <<-- this runs significantly slower than:
compare_dupes = rl.Compare()