[ELE-873] Ablity to run "volume_anomalies" test ignoring "timestamp_column" in config

Open giletich-municorn opened this issue 2 years ago • 0 comments

Slack link

https://elementary-community.slack.com/archives/C02CTC89LAX/p1684167121364969

Use case

Sometimes, you need to run tests on both the "timestamp_column" and the entire table. Currently, this can only be achieved by removing "timestamp_column" from the config

config:
  elementary:
    timestamp_column: processed_at

and passing it explicitly to every test:

models:
  - name: model_name
    tests:
      - elementary.volume_anomalies: ## total volume
          tags: ['elementary']
      - elementary.volume_anomalies: ## volume by bucket
          tags: ['elementary']
          timestamp_column: processed_at
      - elementary.~:
          timestamp_column: processed_at

However, this solution is not elegant, especially if there are many tests on the model that require the "timestamp_column".

It would be nice to have an intuitive test like "total_volume_anomalies" that ignores the "timestamp_column". This would allow us to write the following:

models:
  - name: model_name
    config:
      elementary:
        timestamp_column: processed_at
    tests:
      - elementary.total_volume_anomalies:
          tags: ['elementary']
      - elementary.volume_anomalies:
          tags: ['elementary']

Alternatively, we can add a parameter in "volume_anomalies" like this:

models:
  - name: model_name
    config:
      elementary:
        timestamp_column: processed_at
    tests:
      - elementary.volume_anomalies:
          tags: ['elementary']
          total_rows: true
      - elementary.volume_anomalies:
          tags: ['elementary']

Another option is to allow overwriting "timestamp_column" with "null" like this:

models:
  - name: model_name
    config:
      elementary:
        timestamp_column: processed_at
    tests:
      - elementary.volume_anomalies:
          tags: ['elementary']
          timestamp_column: null
      - elementary.volume_anomalies:
          tags: ['elementary']

Alternatives

Maayan Salom: add this to be a default monitor in "volume_anomalies" (so it would actually test for both in one test, and of course fail if any of these fails).

If we change this to be a default monitor in "volume_anomalies", many users will experience unexpected failures on "volume_anomalies" after updating. It would be even more confusing because the failures would only start occurring after a few days following the update.

Would you be interested in contributing it?

I'd be able to provide assistance if necessary.

_ELE-873

May 15 '23 17:05 giletich-municorn