Add SQL model validation for full and incremental models
Add SQL model validation for full and incremental models
By default model tests run on the completed model output. If a partition is specified model tests will executed per partition.
# example usage
type: model
sql: SELECT range AS number from range(0, 10)
tests:
- name: Number is lower than 10
sql: SELECT 1 FROM foo WHERE number >= 10
# ... snip
# example usage with partitions
type: model
sql: SELECT range AS number, range % 2 AS partition_key FROM range(0, 10)
partitions:
sql: SELECT DISTINCT partition_key FROM (SELECT range % 2 AS partition_key FROM range(0, 10))
partition_tests:
- name: Partition has only even or odd numbers
sql: SELECT 1 FROM m1 WHERE (partition_key = 0 AND number % 2 != 0) OR (partition_key = 1 AND number % 2 != 1)
- name: Partition key is not null
sql: SELECT 1 FROM m1 WHERE partition_key IS NULL
# ... snip
https://github.com/user-attachments/assets/622b6e34-ff13-4024-87a3-bc1906016d4b
Checklist:
- [x] Covered by tests
- [x] Ran it and it works as intended
- [ ] Reviewed the diff before requesting a review
- [ ] Checked for unhandled edge cases
- [ ] Linked the issues it closes
- [ ] Checked if the docs need to be updated
- [ ] Intend to cherry-pick into the release branch
- [ ] I'm proud of this work!
If a test fails, the resulting table view is empty, this is a bit painful if you want to have an iterative write/test/fix workflow, where you first write a sql model and then start to add tests onto it and when a test fails you can iteratively fix your model until the test passes, however when the data goes blank because the test failed you can't iterate effectively (as you can no longer see whats going on)
Ideally the tests could have a warning at the bottom for which ones failed, but still allow the model to exist. This approach allows people to incrementally adopt testing into their workflow if they're not used to it, otherwise the hard block and assert makes it much harder feature for an end user to adopt and ease into.
Longer term would be nice if the SQL test can also display what the failure was (the returned rows can include concatenated strings on what broke to help us debug further). Also surfacing them on end user explore dashboards. For instance if you're in finance looking at your finance dashboard and the underlying model has test failures, it is good to know that the data you're looking at failed some integrity tests.
Anything left to do here?
Wanted to bump here @begelundmuller
Wanted to bump here @begelundmuller
@avaitla We've run into some tricky edge cases around how tests interact with model execution that are taking a while to get right. We're hoping to get this shipped soon!
I like the mix of sql and assert, it makes the simple cases easier but still gives the power of flexible queries. Is there a simple for assert saying "user_id" is unique, or ("user_id", "date") is unique composite, thats another common case.
Also curious when a few tests failed where they will appear and what the message says. Will the dashboard viewer see it somewhere to know the dashboard they're looking at has errors in the underlying model.
I like the mix of sql and assert, it makes the simple cases easier but still gives the power of flexible queries. Is there a simple for assert saying "user_id" is unique, or ("user_id", "date") is unique composite, thats another common case.
Also curious when a few tests failed where they will appear and what the message says. Will the dashboard viewer see it somewhere to know the dashboard they're looking at has errors in the underlying model.
Initially it will be surfaced through the existing "error handling popover" that appears in the lower third of the table results. I will be setting up some time with the frontend folks. Happy to capture your thoughts
If an error is shown for tests, will the end dashboard still be up for end users? Ideally I imagine it could be.
Would this be the best way to encode a unique column test?
SELECT 1 FROM table GROUP BY uniq_key HAVING COUNT(*) > 1
This is a good reference for some similar tests: https://cloud.google.com/bigquery/docs/reference/standard-sql/debugging-statements
If an error is shown for tests, will the end dashboard still be up for end users? Ideally I imagine it could be.
Would this be the best way to encode a unique column test?
SELECT 1 FROM table GROUP BY uniq_key HAVING COUNT(*) > 1This is a good reference for some similar tests: https://cloud.google.com/bigquery/docs/reference/standard-sql/debugging-statements
The model test errors will not prevent dashboards from being available to end users. If model tests fail, the reconciler returns an error, but it does not delete or roll back the model output.