discoverx icon indicating copy to clipboard operation
discoverx copied to clipboard

Delta housekeeping initial version

Open lorenzorubi-db opened this issue 1 year ago • 4 comments
trafficstars

utility on top of discoverx to run Delta Housekeeping across multiple tables

Analysis that provides stats on Delta tables / recommendations for improvements, including:

  • stats:size of tables and number of files, timestamps of latest OPTIMIZE & VACUUM operations, stats of OPTIMIZE)
  • recommendations on tables that need to be OPTIMIZED/VACUUM'ed
  • are tables OPTIMIZED/VACUUM'ed often enough
  • tables that have small files / tables for which ZORDER is not being effective

lorenzorubi-db avatar Jan 09 '24 20:01 lorenzorubi-db

@edurdevic same as PR #95 opened with my user latest commit takes care of your final comments thanks!

lorenzorubi-db avatar Jan 09 '24 20:01 lorenzorubi-db

hi @edurdevic I still need to review further (and document better) but would like that you take a look so that we agree with the approach in the end the refactoring was much bigger to what I expected... anyhow now apply gives back a single dataframe with 3 boolean columns:

  • rec_optimize with rows that need action with OPTIMIZE
  • rec_vacuum analogous for VACUUM
  • rec_misc other recommendations

plus 3 string columns with the reasons for each thanks!

lorenzorubi-db avatar Jan 28 '24 12:01 lorenzorubi-db

@edurdevic ready to review, thanks

lorenzorubi-db avatar Feb 04 '24 20:02 lorenzorubi-db

@edurdevic pls take another look, thanks

lorenzorubi-db avatar Feb 11 '24 13:02 lorenzorubi-db