Entity resolution topic

Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.

List Entity resolution repositories

data-matching-software

349
Stars
41
Forks
Watchers

A list of free data matching and record linkage software.

FEBRL-fork-v0.4.2

23
Stars
21
Forks
Watchers

Fork of the Freely Extensible Biomedical Record Linkage program

recordlinkage

915
Stars
150
Forks
Watchers

A powerful and modular toolkit for record linkage and duplicate detection in Python

recordlinkage-annotator

41
Stars
8
Forks
Watchers

A browser user interface for manual labeling of record pairs.

dedupe

4.0k
Stars
539
Forks
Watchers

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

zingg

890
Stars
108
Forks
Watchers

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Entity-Linking-Recent-Trends

338
Stars
19
Forks
Watchers

Recent trends of Entity Linking, Disambiguation, and Representation.

vert-papers

264
Stars
91
Forks
Watchers

This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microso...

csvdedupe

403
Stars
83
Forks
Watchers

:id: Command line tool for deduplicating CSV files

dedupe-examples

394
Stars
216
Forks
Watchers

:id: Examples for using the dedupe library