rill icon indicating copy to clipboard operation
rill copied to clipboard

Add SQL model validation for full and incremental models

Open grahamplata opened this issue 7 months ago • 5 comments

Add SQL model validation for full and incremental models

By default model tests run on the completed model output. If a partition is specified model tests will executed per partition.

# example usage
type: model
sql: SELECT range AS number from range(0, 10)
tests:
  - name: Number is lower than 10
    sql: SELECT 1 FROM foo WHERE number >= 10
# ... snip
# example usage with partitions
type: model
sql: SELECT range AS number, range % 2 AS partition_key FROM range(0, 10)
partitions:
  sql: SELECT DISTINCT partition_key FROM (SELECT range % 2 AS partition_key FROM range(0, 10))
partition_tests:
  - name: Partition has only even or odd numbers
    sql: SELECT 1 FROM m1 WHERE (partition_key = 0 AND number % 2 != 0) OR (partition_key = 1 AND number % 2 != 1)
  - name: Partition key is not null
    sql: SELECT 1 FROM m1 WHERE partition_key IS NULL
# ... snip

https://github.com/user-attachments/assets/622b6e34-ff13-4024-87a3-bc1906016d4b

Checklist:

  • [x] Covered by tests
  • [x] Ran it and it works as intended
  • [ ] Reviewed the diff before requesting a review
  • [ ] Checked for unhandled edge cases
  • [ ] Linked the issues it closes
  • [ ] Checked if the docs need to be updated
  • [ ] Intend to cherry-pick into the release branch
  • [ ] I'm proud of this work!

grahamplata avatar May 20 '25 19:05 grahamplata

If a test fails, the resulting table view is empty, this is a bit painful if you want to have an iterative write/test/fix workflow, where you first write a sql model and then start to add tests onto it and when a test fails you can iteratively fix your model until the test passes, however when the data goes blank because the test failed you can't iterate effectively (as you can no longer see whats going on)

Screenshot 2025-05-23 at 2 34 16 PM

Ideally the tests could have a warning at the bottom for which ones failed, but still allow the model to exist. This approach allows people to incrementally adopt testing into their workflow if they're not used to it, otherwise the hard block and assert makes it much harder feature for an end user to adopt and ease into.

avaitla avatar May 23 '25 19:05 avaitla

Longer term would be nice if the SQL test can also display what the failure was (the returned rows can include concatenated strings on what broke to help us debug further). Also surfacing them on end user explore dashboards. For instance if you're in finance looking at your finance dashboard and the underlying model has test failures, it is good to know that the data you're looking at failed some integrity tests.

avaitla avatar May 23 '25 20:05 avaitla

Anything left to do here?

avaitla avatar Jun 02 '25 23:06 avaitla

Wanted to bump here @begelundmuller

avaitla avatar Jun 10 '25 17:06 avaitla

Wanted to bump here @begelundmuller

@avaitla We've run into some tricky edge cases around how tests interact with model execution that are taking a while to get right. We're hoping to get this shipped soon!

begelundmuller avatar Jun 12 '25 21:06 begelundmuller

I like the mix of sql and assert, it makes the simple cases easier but still gives the power of flexible queries. Is there a simple for assert saying "user_id" is unique, or ("user_id", "date") is unique composite, thats another common case.

Also curious when a few tests failed where they will appear and what the message says. Will the dashboard viewer see it somewhere to know the dashboard they're looking at has errors in the underlying model.

avaitla avatar Jun 24 '25 21:06 avaitla

I like the mix of sql and assert, it makes the simple cases easier but still gives the power of flexible queries. Is there a simple for assert saying "user_id" is unique, or ("user_id", "date") is unique composite, thats another common case.

Also curious when a few tests failed where they will appear and what the message says. Will the dashboard viewer see it somewhere to know the dashboard they're looking at has errors in the underlying model.

Initially it will be surfaced through the existing "error handling popover" that appears in the lower third of the table results. I will be setting up some time with the frontend folks. Happy to capture your thoughts

grahamplata avatar Jun 24 '25 22:06 grahamplata

If an error is shown for tests, will the end dashboard still be up for end users? Ideally I imagine it could be.

Would this be the best way to encode a unique column test?

SELECT 1 FROM table GROUP BY uniq_key HAVING COUNT(*) > 1

This is a good reference for some similar tests: https://cloud.google.com/bigquery/docs/reference/standard-sql/debugging-statements

avaitla avatar Jun 25 '25 15:06 avaitla

If an error is shown for tests, will the end dashboard still be up for end users? Ideally I imagine it could be.

Would this be the best way to encode a unique column test?

SELECT 1 FROM table GROUP BY uniq_key HAVING COUNT(*) > 1

This is a good reference for some similar tests: https://cloud.google.com/bigquery/docs/reference/standard-sql/debugging-statements

The model test errors will not prevent dashboards from being available to end users. If model tests fail, the reconciler returns an error, but it does not delete or roll back the model output.

grahamplata avatar Jul 02 '25 18:07 grahamplata