hooli-data-eng-pipelines icon indicating copy to clipboard operation
hooli-data-eng-pipelines copied to clipboard

Moving from auto materialization to automation conditions

Open slopp opened this issue 6 months ago • 1 comments

The past version of hooli relied on auto-materialization policies. Primarily these policies were based on the default eager auto-materialization strategy, with some customization to prevent missing or outdated partitions from blocking downstream non-partitioned asset runs. The root assets were run by a cron schedule, with most changes propagating through downstream assets daily.

This PR replaces those auto-materialization policies with their counter-part automation conditions. The PR also adds some complexity to introduce new automation scenarios, specifically:

  • The core assets orders, orders_cleaned, users, users_cleaned, and orders_augmented are still scheduled by a daily job
  • company_stats remains eager, and should see changes propagated daily
  • location_stats remains eager, and should see changes if the upstream locations is manually run
  • sku_stats is updated to use on_cron to run monthly
  • company_perf is updated to defer its runtime based on its downstreams; which is critically avg_orders which is updated to run on odd days - the end result is that both company_perf and avg_orders should only run on odd-numbered days

The remaining assets are unchanged.

Testing this will most likely require merging and then manually watching for behavior.

One important note is that these new conditions are managed by a sensor, not a daemon.

slopp avatar Aug 16 '24 17:08 slopp