dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[Unit Testing] Allow explicit precision testing in unit tests

Open emmyoop opened this issue 2 years ago • 4 comments

Housekeeping

  • [X] I am a maintainer of dbt-core

Short description

We cannot currently unit test precision. The expected outputs get rounded to be the expected datatypes and therefore are not correctly parsed.

from @MichelleArk in #9627

I think it would likely be in get_expected_sql, since that applies casting to the 'expected' structure based on the sql of the model being tested.

I think there are a couple ways we can approach a fix here:

Explicit option for users to opt out of the automatic casting of expect values. E.g. something called '_dbt_skip_casting' or '_dbt_use_literal'. I think this option should only be valid on the expect rows as opposed to input ones, since there isn't any benefit to being able to test your model given a different input schema as those are fixed in production.

Acceptance criteria

  • A unit test can be written to test precision

Suggested Tests

model.sql

with

source as (

    select * from {{ source('stripe',  'payment') }}

),

staged as (

    select
        id as payment_id,
        orderid as order_id,
        paymentmethod as payment_method,
        status,
        -- amount is stored in cents, convert it to dollars
        round(amount / 100, 2) as amount,
        created as created_at
    from source

)

select * from staged

failing unit test yaml

unit_tests:
  - name: test_dollar_conversion # this is the unique name of the test
    model: stg_payments # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: source('stripe',  'payment')
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect: # the expected output given the inputs above
      rows:
        - {payment_id: 2, amount: 0.5}
        - {payment_id: 3, amount: 0.3333}

passing unit test yaml

unit_tests:
  - name: test_dollar_conversion # this is the unique name of the test
    model: stg_payments # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: source('stripe',  'payment')
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect: # the expected output given the inputs above
      rows:
        - {payment_id: 2, amount: 0.50}
        - {payment_id: 3, amount: 0.33}

Impact to Other Teams

none

Will backports be required?

no

Context

Outcome of spike in https://github.com/dbt-labs/dbt-core/issues/9627

This will required a change to the macros in dbt-adapters and tests in dbt-core. The tests in dbt-core will not pass until a new version of dbt-adapters is released.

emmyoop avatar Apr 09 '24 15:04 emmyoop

@graciegoheen Can you take a look at spike outcome here and see whether you think this ticket is worth pursuing?

ChenyuLInx avatar May 06 '24 16:05 ChenyuLInx

We should document it do not work if we are not moving forward on this ticket.

ChenyuLInx avatar May 06 '24 16:05 ChenyuLInx

I think:

  • we should try to support this
  • I'm not convinced the suggested interface
expect:
      rows:
        - {payment_id: 3, dollar: .333}            # expected
        - {payment_id: 4, dollar: 0.333323, _dbt_skip_casting: ["dollar"]}

is the best place to put this configuration

The configuration is:

  • only relevant for expect (?)
  • specific to X amount of columns

I'll sync with unit testing squad on what interface would be best here

graciegoheen avatar May 07 '24 00:05 graciegoheen

What would be the consequences if we cast the expected output to the widest scale/precision/character size for the relevant data type? i.e., numeric without precision and scale rather than numeric(10, 2)?

It might give a failing unit test the way that we are after.

My understanding is that we use "wide" data types by default when enforcing model contracts. I'm not sure if there's prior art we could re-use here.

Code example

models/source_data.sql

select 2 as id, 50 as amount union all
select 3 as id, 33.33 as amount

models/stg_payments.sql

with

source as (

    select * from {{ ref("source_data") }}

),

staged as (

    select
        id as payment_id,
        -- amount is stored in cents, convert it to dollars
        cast(round(amount / 100, 2) as numeric(10, 2)) as amount
    from source

)

select * from staged

models/_unit.yml

unit_tests:
  - name: test_dollar_conversion
    model: stg_payments
    given:
      - input: ref("source_data")
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect:
      rows:
        - {payment_id: 2, amount: 0.5}
        - {payment_id: 3, amount: 0.3333}
dbt build
cp target/run/my_project/models/_unit.yml/models/test_dollar_conversion.sql analyses/test_dollar_conversion__current.sql
cp analyses/test_dollar_conversion__current.sql analyses/test_dollar_conversion__proposed.sql

Then modify analyses/test_dollar_conversion__proposed.sql like this:

55c55
<     cast(0.5 as numeric(10,2))
---
>     cast(0.5 as numeric)
63c63
<     cast(0.3333 as numeric(10,2))
---
>     cast(0.3333 as numeric)
dbt show -s analyses/test_dollar_conversion__current.sql
dbt show -s analyses/test_dollar_conversion__proposed.sql

Experiment

I tried changing this line to be column.dtype instead of column.data_type.

It worked when I tried it, but only if I quoted the "0.50" portion.

    expect:
      rows:
        - {payment_id: 2, amount: "0.50"}
        - {payment_id: 3, amount: 0.33}

Otherwise, I got this error:

image

I didn't try any other adapters other than dbt-postgres. The behavior might depend on the way each Python driver renders the value to a string 🤷.

Using the sql format like this also worked along with the data_type -> dtype change:

    expect:
        format: sql
        rows: |
          select 2 as payment_id, cast(0.50 as numeric(10, 2)) as amount union all
          select 3 as payment_id, cast(0.33 as numeric(10, 2)) as amount

dbeatty10 avatar May 07 '24 15:05 dbeatty10

We ran into this in and had a related discussion on the dbt slack here

What would be the consequences if we cast the expected output to the widest scale/precision/character size for the relevant data type?

For Redshift at least casting to just Numeric results in (18,0)

An option to use the types defined in our contract for the model would have been a nice way to solve this from our perspective. Being not too familiar with the implementation I thought the contract was the source of the truth for the expected data types. But that may not be possible if the contract uses "wide" data types

KaneMorgan avatar Jul 31 '24 16:07 KaneMorgan