data-profiling topic

List data-profiling repositories

cleanlab

9.3k
Stars
722
Forks
Watchers

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

soda-core

1.9k
Stars
208
Forks
Watchers

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

popmon

487
Stars
35
Forks
Watchers

Monitor the stability of a Pandas or Spark dataframe ⚙︎

data-profiling

70
Stars
19
Forks
Watchers

a set of scripts to pull meta data and data profiling metrics from relational database systems

auctus

41
Stars
10
Forks
Watchers

Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index

cleanvision

935
Stars
69
Forks
Watchers

Automatically find issues in image datasets and practice data-centric computer vision.

metacrafter

39
Stars
6
Forks
Watchers

Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules

fta

22
Stars
2
Forks
Watchers

Metadata/data identification Java library. Identifies Semantic Type information (e.g. Gender, Age, Color, Country,...). Extensive country/language support. Extensible via user-defined plugins. Compreh...

dataqtor

16
Stars
7
Forks
Watchers

🔍Your Data Quality Detector / Gain insight into your data and get it ready for use before you start working with it 💡📊🛠💎

desbordante-core

361
Stars
61
Forks
Watchers

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...