recordlinkage issues

Addition of String Comparison Method - Jaccard Similarity

-> New method for string comparison: Jaccard Similarity -> Ran all tests, 207 passed 90 failed, 2 errors, 768 warnings, None related to the Jaccard Similarity calculation -> Tested on...

debadridtt

K-Fold Cross validation in Record Linkage

1

The library documentation do not provide much guidance on test/train split and cross validation. See below an implementation using KFold object in sci-kit learn. How does the blocking strategy used...

mayerantoine

How to do a "LEFT JOIN", enforce on dataset to be on the comparison?

1

Amazing scripts you've got, thanks a lot for sharing. I'm trying to match payment records, but I couldn't find an option to "enforce" that one of the set is present,...

ccrvlh

Calculate distance in addition to similarity

Hi, All string algorithms are computing the similariy : 1 - distance / max_length_string. This puts short chains at a disadvantage compared to long chains, and in some cases, a...

AntoineLamer

Add textdistance matching algorithms in recordlinkage compare string

4

Hello, I am currently using this module to do some record linking stuff, I am thinking of contributing some string matching algorithms that are implemented in [textdistance](https://github.com/life4/textdistance), I'm currently using...

rafmacalaba

Include option to use Spark dataframes

5

Hi, I'm considering to write an extension making it possible to use spark dataframes with this tool. as it is pretty similar to Pandas dataframes, but does not necessarily have...

ohenrik

feature request

numeric offset vs scale

Hi, Thank you for making this awesome library! I am bit confused on the parameters of the numeric comparison function specifically offset and scale. [Documentation for numeric](https://recordlinkage.readthedocs.io/en/latest/ref-compare.html) The graph arrow...

s3afroze

Multiple Core Issues

1

Specifying the number of cores (n_jobs) appears to make the algorithm run slower. dupe_indexer = rl.Index() dupe_indexer.block(['first_name_clean','last_name_clean']) dupe_candidate_links = dupe_indexer.index(df) compare_dupes = rl.Compare(**n_jobs=12**).

logisticregress

Issues with the geographic classification method

1

Hi, I'm just wondering if there is an example of using the current version of this package with the geographic method? If I try to add Haversine distance to my...

JosephKuchar

Fuzzy matching

4

I have 2 dataframes - df1 = pd.DataFrame() df2 = pd.DataFrame() df1['company_name'] = ['Crysagi Systems Pvt','Coreview.'] df2['company_name'] = ['Crysagi Systems Pvt Ltd','Coreview','sadadas'] I am trying to do a fuzzy search...

shreyaspuranik

recordlinkage
recordlinkage copied to clipboard

Metadata

Addition of String Comparison Method - Jaccard Similarity

K-Fold Cross validation in Record Linkage

How to do a "LEFT JOIN", enforce on dataset to be on the comparison?

Calculate distance in addition to similarity

Add textdistance matching algorithms in recordlinkage compare string

Include option to use Spark dataframes

numeric offset vs scale

Multiple Core Issues

Issues with the geographic classification method

Fuzzy matching

← Metadata

Owner

Metadata

recordlinkage recordlinkage copied to clipboard

Metadata

← Metadata

Owner

Metadata

recordlinkage
recordlinkage copied to clipboard