bdit_data-sources icon indicating copy to clipboard operation
bdit_data-sources copied to clipboard

Eco counter dagify

Open gabrielwol opened this issue 10 months ago • 1 comments

What this pull request accomplishes:

  • Builds on Nate's ecocounter code for a daily airflow DAG.
  • Automatically inserts new sites/channels (marked as validated = False) and notifies in slack.
  • Updates all the ecocounter DDL in github to match current definitions in bigdata (plus add validated col to sites and flows)

Issue(s) this solves:

  • Closes #912

What, in particular, needs to reviewed:

  • [ ] Think through next steps on how to use validated column.

What needs to be done by a sysadmin after this PR is merged

  • Replace all references to gwolofs schema with ecocounter. Also add validated columns to ecocounter and backfill data since the last ecocounter pull.
  • Update data_scripts path

gabrielwol avatar Apr 09 '24 16:04 gabrielwol

@Nate-Wessel addressed your comments + added a readme section for the DAG. Once you're OK with how we're handling the unvalidated flows, I can merge and adjust all the scripts to point to ecocounter schema instead of gwolofs. @chmnata can you quickly look over the DAG? It is working as expected: https://trans-bdit.intra.prod-toronto.ca/airflow/dags/ecocounter_pull/grid. Nate has reviewed the rest.

gabrielwol avatar Apr 11 '24 19:04 gabrielwol

@Nate-Wessel , @chmnata Ready for re-review! The DAG is now inserting into ecocounter.counts_unfiltered and there is a new view ecocounter.counts which has only validated flows. Also added partitioning to ecocounter.counts_unfiltered on Natalie's request.

gabrielwol avatar Apr 15 '24 21:04 gabrielwol