data-matching topic

List data-matching repositories

data-matching-software

350
Stars
41
Forks
Watchers

A list of free data matching and record linkage software.

recordlinkage

915
Stars
150
Forks
Watchers

A powerful and modular toolkit for record linkage and duplicate detection in Python

recordlinkage-annotator

41
Stars
8
Forks
Watchers

A browser user interface for manual labeling of record pairs.

splink

1.1k
Stars
128
Forks
Watchers

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

entity-embed

139
Stars
13
Forks
Watchers

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

fuzzymatcher

280
Stars
60
Forks
Watchers

Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

soweego

95
Stars
8
Forks
Watchers

Link Wikidata items to large catalogs

record-linkage-resources

105
Stars
15
Forks
Watchers

Resources for tackling record linkage / deduplication / data matching problems

levitate

33
Stars
2
Forks
Watchers

Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).

nominally

25
Stars
0
Forks
Watchers

A maximum-strength name parser for record linkage.