discoverx
discoverx copied to clipboard
Delta housekeeping initial version
utility on top of discoverx to run Delta Housekeeping across multiple tables
Analysis that provides stats on Delta tables / recommendations for improvements, including:
- stats:size of tables and number of files, timestamps of latest OPTIMIZE & VACUUM operations, stats of OPTIMIZE)
- recommendations on tables that need to be OPTIMIZED/VACUUM'ed
- are tables OPTIMIZED/VACUUM'ed often enough
- tables that have small files / tables for which ZORDER is not being effective
@edurdevic same as PR #95 opened with my user latest commit takes care of your final comments thanks!
hi @edurdevic
I still need to review further (and document better) but would like that you take a look so that we agree with the approach
in the end the refactoring was much bigger to what I expected... anyhow now apply gives back a single dataframe with 3 boolean columns:
rec_optimizewith rows that need action with OPTIMIZErec_vacuumanalogous for VACUUMrec_miscother recommendations
plus 3 string columns with the reasons for each thanks!
@edurdevic ready to review, thanks
@edurdevic pls take another look, thanks