health-equity-tracker icon indicating copy to clipboard operation
health-equity-tracker copied to clipboard

Verify that the output of each `prod` pipeline is exactly the same as the `dev` pipeline.

Open joshzarrabi opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe.

Recently, we had a strange situation where the output of our prod pipeline was not exactly the same as the output of the dev pipeline. This should never happen and if it does we should fail to update the API and investigate.

Describe the solution you'd like

We should add a new Airflow job that runs right before the exporter step on each prod pipeline that ensures that the updated prod big query tables are exactly the same as the dev big query tables.

We can utilize the logic in the https://github.com/SatcherInstitute/health-equity-tracker/blob/main/e2e_tests/scripts/ensure_datasets_equal.py script to do this.

Additional context

An important thing to note is this is the first time we will have a diversion in the logic between the dev and prod pipelines. In my opinion we should handle this in the step itself. so the actual airflow pipelines will be the same, but the verify step, or whatever we choose to call the output of this, will just always pass on the dev pipeline.

joshzarrabi avatar Sep 21 '22 02:09 joshzarrabi