redflag
redflag copied to clipboard
Mention & differentiate from similar packages in README
- Great Expectations - seems big and cumbersome - lesson: stay lean
- Evidently - "framework to evaluate, test and monitor ML models in production." - looks nice but quite plugged in to Jupyter, eg lots of plots
ydata-quality- from the same people as profiling (below), and does not look well maintained- Pandas Profiling - generates some alerts eg see below - should test this on my usual datasets
More specialized:
- Pandera - "statistical data validation for pandas" (and only pandas) - lesson: support other data formats
pandas_dq- looks quite nice, pands only- Spectacles - continuous integration tool for Looker and LookML (a GCP service?)py
- Datafold - time-series and point clouds?
- dbt (Data Build Tool) - not sure what this is
- Deequ - targeted at Spark/PySpark dataframes only I think
Couple of nice posts etc
- http://mfcabrera.com/blog/pandas-dataa-validation-machine-learning.html
- See #69