dbt-databricks
dbt-databricks copied to clipboard
test flow of data using time travel
No idea if other people would use this but I've needed this for a while.
Problem
Data is only allowed to change in one direction following some flow. Like Step 1
-> Step 2
-> Step 3
and we want to fail if data changes from Step 3
to Step 2
. Delta's time travel lets us actually test for this.
Proposal
This might not be the best method (or the right repo) but here's what I've come up with
{% test workflow_diode(model, column_name, step_order, id_column) %}
{% set sql_statement %}
select version from (describe history {{model}}) limit 1
{% endset %}
{% set latest_version = dbt_utils.get_single_value(sql_statement) %}
SELECT *
FROM (
SELECT cur_version.{{id_column}},
CASE
{%- for s in step_order %}
WHEN cur_version.{{column_name}} = '{{s}}' THEN {{loop.index}}
{% endfor -%}
END AS cur_flow,
CASE
{%- for s in step_order %}
WHEN prev_version.{{column_name}} = '{{s}}' THEN {{loop.index}}
{% endfor -%}
END AS prev_flow
FROM {{model}}@v{{latest_version}} AS cur_version
INNER JOIN {{model}}@v{{((latest_version|int)-1)}} AS prev_version
ON cur_version.{{id_column}} = prev_version.{{id_column}}
)
WHERE cur_flow < prev_flow
{% endtest %}
Where my schema looks like
sources:
- name: test_flow
tables:
- name: test_flow
columns:
- name: step
tests:
- workflow_diode:
step_order: ['Step 1', 'Step 2', 'Step 3', 'Step 4']
id_column: id
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.
Nice!
I'm unclear on exactly what you're trying to assert in the test here?
@benc-db That something never goes from Step3 to step2. If I have a workflow with a specific order of tasks I just don't want things to go in reverse. My actual use case is to test product lifecycles as we're supposed to only go one way.