health-equity-tracker
health-equity-tracker copied to clipboard
Add sanity checks to the data pipelines
Background
As engineers developing features, we would like to have confidence that the data looks something like we would expect it to. There are basic things we look for in the data after running updates, and we can automate some of those checks.
What's Missing / Problems with the current approach
Recently. as described in #1786 we ran into an issue where some pct_share
metrics were both:
- Impossible (added up to well over 100%)
- Not the same as what was generated in the dev pipeline.
These are both things that could have been caught in a basic pipeline check.
Proposed Solution
We should steps the Airflow pipeline, right before the exporter
step, that run basic sanity checks on the data. If any of these steps fail the pipeline should fail and the API will not be updated.
End State
This epic will be done when we have tasks in very pipeline that ensure the data does not have any glaring errors.
How will this move the needle on health equity?
This will help us avoid any obvious errors in the data that will make our users potentially lose confidence in the tracker.
Tasks:
- [ ] #1806
- [x] #1807