Entity resolution topic

Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.

List Entity resolution repositories

splink

1.3k
Stars
146
Forks
Watchers

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

entity-embed

139
Stars
13
Forks
Watchers

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

nlu

825
Stars
126
Forks
Watchers

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.

JedAIToolkit

201
Stars
45
Forks
Watchers

An open source, high scalability toolkit in Java for Entity Resolution.

sparker

61
Stars
18
Forks
Watchers

SparkER: an Entity Resolution framework for Apache Spark

anonlink

60
Stars
7
Forks
Watchers

Python implementation of anonymous linkage using cryptographic linkage keys

soweego

95
Stars
8
Forks
Watchers

Link Wikidata items to large catalogs

rltk

103
Stars
23
Forks
Watchers

Record Linkage ToolKit (Find and link entities)

record-linkage-resources

123
Stars
15
Forks
Watchers

Resources for tackling record linkage / deduplication / data matching problems