deltacat icon indicating copy to clipboard operation
deltacat copied to clipboard

Update input_size_bytes, input_records, and input_records

Open raghumdani opened this issue 2 years ago • 0 comments

For rebase and backfill scenarios, we correctly calculate input stats. However, for incremental this will only represent incremental delta stats. Due to this we are having to hack around our way to determine actual data scanned. Updating these audits to allow calculating the total data scanned easy.

  1. For rebase, total data scanned (at rest size) is input_size_bytes - untouched_size_bytes.
  2. For backfill, total data scanned (at rest size) is input_size_bytes - untouched_size_bytes.
  3. For incremental, total_data_scanned (at rest size) is input_size_bytes - untouched_size_bytes.

Hence, we have a single math expression representing data scanned for all the different types of compaction.

raghumdani avatar Oct 29 '23 01:10 raghumdani