Add filtering options for the data monitoring tests
Currently, we use the timestamp column as a filter, or no filtering at all (run on the entire table). For some use cases, this is not enough.
Use cases:
- Snapshot tables - timestamp is not relevant, you want to filter on rows where 'valid_to' is null
- Big tables with no timestamp column - order by + limit?
dbt supports where conditions using this macro: https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/tests/where_subquery.sql (Documented here: https://docs.getdbt.com/reference/resource-configs/where)
We need to understand if there are use cases where you need both the timestamp and additional filtering? Should there be a different behaviour for such tests?
For the snapshot use-case it's specifically where dbt_valid_to is not null.
We also may want to filter out test data, where there is some internal business logic e.g. some_column != True. Test data often appears anomalous, so excluding it from our anomaly detection would also make the monitors more accurate! In these cases we may want to use the timestamp filtering as well as the where clause, if that's terribly complex we could handle both filters in the where clause
This issue is stale because it has been open for too long with no activity. If you would like the issue to remain open, please remove the stale label or leave a comment.