[Unit Testing] Allow explicit precision testing in unit tests
Housekeeping
- [X] I am a maintainer of dbt-core
Short description
We cannot currently unit test precision. The expected outputs get rounded to be the expected datatypes and therefore are not correctly parsed.
from @MichelleArk in #9627
I think it would likely be in get_expected_sql, since that applies casting to the 'expected' structure based on the sql of the model being tested.
I think there are a couple ways we can approach a fix here:
Explicit option for users to opt out of the automatic casting of expect values. E.g. something called '_dbt_skip_casting' or '_dbt_use_literal'. I think this option should only be valid on the expect rows as opposed to input ones, since there isn't any benefit to being able to test your model given a different input schema as those are fixed in production.
Acceptance criteria
- A unit test can be written to test precision
Suggested Tests
model.sql
with
source as (
select * from {{ source('stripe', 'payment') }}
),
staged as (
select
id as payment_id,
orderid as order_id,
paymentmethod as payment_method,
status,
-- amount is stored in cents, convert it to dollars
round(amount / 100, 2) as amount,
created as created_at
from source
)
select * from staged
failing unit test yaml
unit_tests:
- name: test_dollar_conversion # this is the unique name of the test
model: stg_payments # name of the model I'm unit testing
given: # the mock data for your inputs
- input: source('stripe', 'payment')
rows:
- {id: 2, amount: 50}
- {id: 3, amount: 33.33}
expect: # the expected output given the inputs above
rows:
- {payment_id: 2, amount: 0.5}
- {payment_id: 3, amount: 0.3333}
passing unit test yaml
unit_tests:
- name: test_dollar_conversion # this is the unique name of the test
model: stg_payments # name of the model I'm unit testing
given: # the mock data for your inputs
- input: source('stripe', 'payment')
rows:
- {id: 2, amount: 50}
- {id: 3, amount: 33.33}
expect: # the expected output given the inputs above
rows:
- {payment_id: 2, amount: 0.50}
- {payment_id: 3, amount: 0.33}
Impact to Other Teams
none
Will backports be required?
no
Context
Outcome of spike in https://github.com/dbt-labs/dbt-core/issues/9627
This will required a change to the macros in dbt-adapters and tests in dbt-core. The tests in dbt-core will not pass until a new version of dbt-adapters is released.
@graciegoheen Can you take a look at spike outcome here and see whether you think this ticket is worth pursuing?
We should document it do not work if we are not moving forward on this ticket.
I think:
- we should try to support this
- I'm not convinced the suggested interface
expect:
rows:
- {payment_id: 3, dollar: .333} # expected
- {payment_id: 4, dollar: 0.333323, _dbt_skip_casting: ["dollar"]}
is the best place to put this configuration
The configuration is:
- only relevant for
expect(?) - specific to X amount of columns
I'll sync with unit testing squad on what interface would be best here
What would be the consequences if we cast the expected output to the widest scale/precision/character size for the relevant data type? i.e., numeric without precision and scale rather than numeric(10, 2)?
It might give a failing unit test the way that we are after.
My understanding is that we use "wide" data types by default when enforcing model contracts. I'm not sure if there's prior art we could re-use here.
Code example
models/source_data.sql
select 2 as id, 50 as amount union all
select 3 as id, 33.33 as amount
models/stg_payments.sql
with
source as (
select * from {{ ref("source_data") }}
),
staged as (
select
id as payment_id,
-- amount is stored in cents, convert it to dollars
cast(round(amount / 100, 2) as numeric(10, 2)) as amount
from source
)
select * from staged
models/_unit.yml
unit_tests:
- name: test_dollar_conversion
model: stg_payments
given:
- input: ref("source_data")
rows:
- {id: 2, amount: 50}
- {id: 3, amount: 33.33}
expect:
rows:
- {payment_id: 2, amount: 0.5}
- {payment_id: 3, amount: 0.3333}
dbt build
cp target/run/my_project/models/_unit.yml/models/test_dollar_conversion.sql analyses/test_dollar_conversion__current.sql
cp analyses/test_dollar_conversion__current.sql analyses/test_dollar_conversion__proposed.sql
Then modify analyses/test_dollar_conversion__proposed.sql like this:
55c55
< cast(0.5 as numeric(10,2))
---
> cast(0.5 as numeric)
63c63
< cast(0.3333 as numeric(10,2))
---
> cast(0.3333 as numeric)
dbt show -s analyses/test_dollar_conversion__current.sql
dbt show -s analyses/test_dollar_conversion__proposed.sql
Experiment
I tried changing this line to be column.dtype instead of column.data_type.
It worked when I tried it, but only if I quoted the "0.50" portion.
expect:
rows:
- {payment_id: 2, amount: "0.50"}
- {payment_id: 3, amount: 0.33}
Otherwise, I got this error:
I didn't try any other adapters other than dbt-postgres. The behavior might depend on the way each Python driver renders the value to a string 🤷.
Using the sql format like this also worked along with the data_type -> dtype change:
expect:
format: sql
rows: |
select 2 as payment_id, cast(0.50 as numeric(10, 2)) as amount union all
select 3 as payment_id, cast(0.33 as numeric(10, 2)) as amount
We ran into this in and had a related discussion on the dbt slack here
What would be the consequences if we cast the expected output to the widest scale/precision/character size for the relevant data type?
For Redshift at least casting to just Numeric results in (18,0)
An option to use the types defined in our contract for the model would have been a nice way to solve this from our perspective. Being not too familiar with the implementation I thought the contract was the source of the truth for the expected data types. But that may not be possible if the contract uses "wide" data types