health-equity-tracker icon indicating copy to clipboard operation
health-equity-tracker copied to clipboard

Add sanity checks to the data pipelines

Open joshzarrabi opened this issue 2 years ago • 0 comments

Background

As engineers developing features, we would like to have confidence that the data looks something like we would expect it to. There are basic things we look for in the data after running updates, and we can automate some of those checks.

What's Missing / Problems with the current approach

Recently. as described in #1786 we ran into an issue where some pct_share metrics were both:

  • Impossible (added up to well over 100%)
  • Not the same as what was generated in the dev pipeline.

These are both things that could have been caught in a basic pipeline check.

Proposed Solution

We should steps the Airflow pipeline, right before the exporter step, that run basic sanity checks on the data. If any of these steps fail the pipeline should fail and the API will not be updated.

End State

This epic will be done when we have tasks in very pipeline that ensure the data does not have any glaring errors.

How will this move the needle on health equity?

This will help us avoid any obvious errors in the data that will make our users potentially lose confidence in the tracker.

Tasks:

  • [ ] #1806
  • [x] #1807

joshzarrabi avatar Sep 21 '22 02:09 joshzarrabi