data-profiling topic
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
popmon
Monitor the stability of a Pandas or Spark dataframe ⚙︎
data-profiling
a set of scripts to pull meta data and data profiling metrics from relational database systems
auctus
Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
cleanvision
Automatically find issues in image datasets and practice data-centric computer vision.
metacrafter
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
fta
Metadata/data identification Java library. Identifies Semantic Type information (e.g. Gender, Age, Color, Country,...). Extensive country/language support. Extensible via user-defined plugins. Compreh...
dataqtor
🔍Your Data Quality Detector / Gain insight into your data and get it ready for use before you start working with it 💡📊🛠💎
desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...