[ELE-873] Ablity to run "volume_anomalies" test ignoring "timestamp_column" in config
Slack link
https://elementary-community.slack.com/archives/C02CTC89LAX/p1684167121364969
Use case
Sometimes, you need to run tests on both the "timestamp_column" and the entire table. Currently, this can only be achieved by removing "timestamp_column" from the config
config:
elementary:
timestamp_column: processed_at
and passing it explicitly to every test:
models:
- name: model_name
tests:
- elementary.volume_anomalies: ## total volume
tags: ['elementary']
- elementary.volume_anomalies: ## volume by bucket
tags: ['elementary']
timestamp_column: processed_at
- elementary.~:
timestamp_column: processed_at
However, this solution is not elegant, especially if there are many tests on the model that require the "timestamp_column".
It would be nice to have an intuitive test like "total_volume_anomalies" that ignores the "timestamp_column". This would allow us to write the following:
models:
- name: model_name
config:
elementary:
timestamp_column: processed_at
tests:
- elementary.total_volume_anomalies:
tags: ['elementary']
- elementary.volume_anomalies:
tags: ['elementary']
Alternatively, we can add a parameter in "volume_anomalies" like this:
models:
- name: model_name
config:
elementary:
timestamp_column: processed_at
tests:
- elementary.volume_anomalies:
tags: ['elementary']
total_rows: true
- elementary.volume_anomalies:
tags: ['elementary']
Another option is to allow overwriting "timestamp_column" with "null" like this:
models:
- name: model_name
config:
elementary:
timestamp_column: processed_at
tests:
- elementary.volume_anomalies:
tags: ['elementary']
timestamp_column: null
- elementary.volume_anomalies:
tags: ['elementary']
Alternatives
Maayan Salom: add this to be a default monitor in "volume_anomalies" (so it would actually test for both in one test, and of course fail if any of these fails).
If we change this to be a default monitor in "volume_anomalies", many users will experience unexpected failures on "volume_anomalies" after updating. It would be even more confusing because the failures would only start occurring after a few days following the update.
Would you be interested in contributing it?
I'd be able to provide assistance if necessary.