Entity resolution topic
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
nlu
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
JedAIToolkit
An open source, high scalability toolkit in Java for Entity Resolution.
zentity
Entity resolution for Elasticsearch.
sparker
SparkER: an Entity Resolution framework for Apache Spark
anonlink
Python implementation of anonymous linkage using cryptographic linkage keys
soweego
Link Wikidata items to large catalogs
rltk
Record Linkage ToolKit (Find and link entities)
record-linkage-resources
Resources for tackling record linkage / deduplication / data matching problems