arche
arche copied to clipboard
Deprecate/rewrite Data Quality Report
Current DQR https://arche.readthedocs.io/en/latest/nbs/DQR.html is:
- scores based on schema validation and some rules\stats https://github.com/scrapinghub/arche/blob/master/src/arche/quality_estimation_algorithm.py
- table of job stats
- some rules summary
- coverage graph (same as in the main report)
- categories tables
I don't think it provides accurate information for developers. My idea is to remove it completely, focusing on the main report #119 #164
Search find . -type f -mtime -90 -name *.ipynb -print -exec grep -l 'data_quality_report' {} \;
shows that data_quality_report
hasn't been used for 3 months.