[Feature] Implicitly cast nested `STRUCT` properties during unit test fixture generation
Is this a new bug in dbt-bigquery?
- [X] I believe this is a new bug in dbt-bigquery
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Unit testing models is fine at row level. But when dbt tries to create a fixture for a nested struct, it does not safe cast all the missing properties. Even the nullable properties of any struct need to be inserted manually at the test in order to make it run without errors.
Expected Behavior
When compiling unit tests, dbt's fixture generation should create safe casted nulls recursively for every missing property of nested structs.
Steps To Reproduce
- Create this GBQ table:
CREATE TABLE gbq_dataset.example_table(
id INT NOT NULL,
element_name VARCHAR(255),
content ARRAY<STRUCT<name VARCHAR(255), description VARCHAR(255), date TIMESTAMP>>
PRIMARY KEY (id)
);
- Create this simple dbt model:
{{
config(materialized='table')
}}
SELECT
*
FROM {{ source('gbq_dataset', 'example_table') }}
- Create this unit_test.yml:
unit_tests:
- name: example_model_refresh
model: example_model
given:
- input: source('gbq_dataset', 'example_table')
rows:
- {
id: 1,
element_name: 'test',
content: ['struct("John" as name)']
}
expect:
rows:
- {
id: 1,
element_name: 'test',
content: ['struct("John" as name)']
}
- The resulting compiled SQL will be like this:
/* {"app": "dbt", "dbt_version": "1.8.0b1", "profile_name": "gbq_project", "target_name": "dev", "node_id": "unit_test.gbq_project.example.example_model_refresh"} */
create or replace table `gbq_project`.`gbq_dataset`.`example_model_refresh__dbt_tmp`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)
)
as (
select * from (
with __dbt__cte__example_table as (
-- Fixture for example_table
select safe_cast(1 as NUMERIC) as id, safe_cast('test' as STRING) as element_name,
(select array_agg(safe_cast(i as struct<`name` string, `description` string, `date` timestamp>)) from unnest([
struct("John" as name)
]) i) as content
) SELECT
*
FROM __dbt__cte__example_table
) as __dbt_sbq
where false and current_timestamp() = current_timestamp()
limit 0
);
- The error below is shown:
Invalid cast from STRUCT<name STRING> to STRUCT<name STRING, description STRING, date TIMESTAMP> at [23:37]
Relevant log output
No response
Environment
- OS: macOS Sonoma 14.4.1
- Python: 3.12.2
- dbt-core: v1.8.0-b1
- dbt-bigquery: v1.8.0-b2
Additional Context
No response
Thanks for reporting this @edudosaara !
After getting insights from @MichelleArk, we believe we should probably document this as a known limitation, and I've opened https://github.com/dbt-labs/docs.getdbt.com/issues/5307 as a result.
All fields in a BigQuery struct need to be specified in a unit test -- it's not currently possible to use only a subset of columns in a struct.
But we agree that we'd like to support this.
To support this, it would probably take writing some struct-specific handling of missing fields in the safe_cast method.
Workaround
Include all the fields in a BigQuery struct within the unit test.