health-equity-tracker
health-equity-tracker copied to clipboard
Verify that the output of each `prod` pipeline is exactly the same as the `dev` pipeline.
Is your feature request related to a problem? Please describe.
Recently, we had a strange situation where the output of our prod pipeline was not exactly the same as the output of the dev pipeline. This should never happen and if it does we should fail to update the API and investigate.
Describe the solution you'd like
We should add a new Airflow job that runs right before the exporter
step on each prod
pipeline that ensures that the updated prod big query tables are exactly the same as the dev big query tables.
We can utilize the logic in the https://github.com/SatcherInstitute/health-equity-tracker/blob/main/e2e_tests/scripts/ensure_datasets_equal.py script to do this.
Additional context
An important thing to note is this is the first time we will have a diversion in the logic between the dev and prod pipelines. In my opinion we should handle this in the step itself. so the actual airflow pipelines will be the same, but the verify
step, or whatever we choose to call the output of this, will just always pass on the dev
pipeline.