dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

test flow of data using time travel

Open VDFaller opened this issue 2 years ago • 4 comments

No idea if other people would use this but I've needed this for a while.

Problem

Data is only allowed to change in one direction following some flow. Like Step 1 -> Step 2 -> Step 3 and we want to fail if data changes from Step 3 to Step 2. Delta's time travel lets us actually test for this.

Proposal

This might not be the best method (or the right repo) but here's what I've come up with

{% test workflow_diode(model, column_name, step_order, id_column) %}

{% set sql_statement %}
    select version from (describe history {{model}}) limit 1
{% endset %}

{% set latest_version = dbt_utils.get_single_value(sql_statement) %}
SELECT * 
FROM (
  SELECT cur_version.{{id_column}}, 
    CASE 
      {%- for s in step_order %}
      WHEN cur_version.{{column_name}} = '{{s}}' THEN {{loop.index}} 
      {% endfor -%}
    END AS cur_flow, 
    CASE 
      {%- for s in step_order %}
      WHEN prev_version.{{column_name}} = '{{s}}' THEN {{loop.index}}
      {% endfor -%}    
    END AS prev_flow
  FROM {{model}}@v{{latest_version}} AS cur_version
  INNER JOIN {{model}}@v{{((latest_version|int)-1)}} AS prev_version
   ON cur_version.{{id_column}} = prev_version.{{id_column}}
)
WHERE cur_flow < prev_flow
{% endtest %}

Where my schema looks like

sources:
  - name: test_flow
    tables: 
      - name: test_flow
        columns:
          - name: step
            tests: 
              - workflow_diode:
                  step_order: ['Step 1', 'Step 2', 'Step 3', 'Step 4']
                  id_column: id

VDFaller avatar Jan 17 '23 19:01 VDFaller

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

github-actions[bot] avatar Jul 17 '23 02:07 github-actions[bot]

Nice!

henrikamnas avatar Dec 02 '23 22:12 henrikamnas

I'm unclear on exactly what you're trying to assert in the test here?

benc-db avatar Jan 12 '24 17:01 benc-db

@benc-db That something never goes from Step3 to step2. If I have a workflow with a specific order of tasks I just don't want things to go in reverse. My actual use case is to test product lifecycles as we're supposed to only go one way.

VDFaller avatar Jan 12 '24 17:01 VDFaller