recordlinkage icon indicating copy to clipboard operation
recordlinkage copied to clipboard

A powerful and modular toolkit for record linkage and duplicate detection in Python

Results 63 recordlinkage issues
Sort by recently updated
recently updated
newest added

it is a little bit frustrating because I cannot find in the documentation for record linkage any explicit way to solve this though seemingly it would be a very commonplace...

Pandas datatypes, such as `pd.Int64Dtype` (see [here](https://pandas.pydata.org/docs/user_guide/basics.html#dtypes)), do not seem to be supported: ```python import recordlinkage from recordlinkage.datasets import load_febrl4 dfA, dfB = load_febrl4() # Convert column types to pandas...

Hi I am utilizing the ECM classifier as my unsupervised classifier for my problem but I keep getting error while calling them that I do not understand why: ecm.fit(df_feature_vectors) log_m_probablity...

import recordlinkage indexer = recordlinkage.Index() indexer = recordlinkage.SortedNeighbourhoodIndex(on='label', window=9) candidate_links = indexer.index(featuresfinal, targetfinal) comp = recordlinkage.Compare() comp.string('label', 'label', method='jarowinkler', label='labels') mymatches = comp.compute(candidate_links, featuresfinal, targetfinal)

I've been developing some data corruption algorithms (inspired by the documentation from https://dmm.anu.edu.au/geco/flex-data-gen-manual.pdf but not looking at the sourcecode, since it has an unusual license), and I wonder if your...

py 3.9.11, fastparquet 0.8.1: writing dataframe to parquet file from a table data field with rtf doc content falls with TypeError exception fp.write(fpath, rows, compression='GZIP', row_group_offsets=row_group_offsets) falls with traceback: TypeError:...

Hi, Just wondering whether the EM-algorithm for frequency based estimates, or any other algorithm taking into account value frequencies is/will be included in the package? Thanks!!

Hi I am linking two datasets. Both of them contain unique id's as identifiers. After reading two datasets into pandas data frames I set those id's as their indexes. So...

Hello. i have around 0.3 million data and i have to make pair on minimum 3 columns, so after doing that i have 40 million index records, and when I'm...

Nothing mentioned in the docs about the supported languages