hooli-data-eng-pipelines
hooli-data-eng-pipelines copied to clipboard
Moving from auto materialization to automation conditions
The past version of hooli relied on auto-materialization policies. Primarily these policies were based on the default eager auto-materialization strategy, with some customization to prevent missing or outdated partitions from blocking downstream non-partitioned asset runs. The root assets were run by a cron schedule, with most changes propagating through downstream assets daily.
This PR replaces those auto-materialization policies with their counter-part automation conditions. The PR also adds some complexity to introduce new automation scenarios, specifically:
- The core assets
orders, orders_cleaned, users, users_cleaned, and orders_augmented
are still scheduled by a daily job -
company_stats
remains eager, and should see changes propagated daily -
location_stats
remains eager, and should see changes if the upstream locations is manually run -
sku_stats
is updated to useon_cron
to run monthly -
company_perf
is updated to defer its runtime based on its downstreams; which is criticallyavg_orders
which is updated to run on odd days - the end result is that bothcompany_perf
andavg_orders
should only run on odd-numbered days
The remaining assets are unchanged.
Testing this will most likely require merging and then manually watching for behavior.
One important note is that these new conditions are managed by a sensor, not a daemon.