sqlmesh Various unit test improvements

Various unit test improvements

Open georgesittas opened this issue 1 year ago • 2 comments

[x] Add option for allowing empty / unspecified columns in unit tests (related Slack thread here)

The use-case for this is that one can have wide tables but they may only want to write tests that focus on a group of related columns and ignore the rest.
[x] Make unit test output easier to read (previous ticket: #948)

When a unit test fails, the whole row shows as different from expected. For wide tables, this makes it difficult to parse where the difference was. The output should highlight which columns were different.
[x] Automatically generate unit tests

Writing unit tests by hand is cumbersome and error-prone, so one idea is to generate them automatically by specifying queries that will be used to populate the input models.
[x] Allow running tests against a subset of resulting columns

Users could have a really wide table and are wanting to test the output against a specific subset of the columns. This change would allow defining just that subset in the expected result and would not throw an error.
[x] Support nested data types

Currently we don't properly support nested data types (Arrays/Lists/Maps/Structs). Slack context: https://tobiko-data.slack.com/archives/C044BRE5W4S/p1703270196743739
[x] Improve error message when there are missing rows in the DataFrame diff (related Slack thread here).

If the actual and expected DataFrames have a different row count, we currently only report that there's a shape mismatch between the two, without providing details. One idea to improve the UX here is to report exactly which rows are missing from the actual data, or limit them using a threshold in case there are many of them.
[x] Generate CTE fixtures too when using the create_test command

The first iteration on the automatic unit test generation didn't include logic for producing CTE outputs.
[x] Add optional setting which allows users to freeze time in order to test CURRENT_TIMESTAMP values and the like

We can leverage freezegun for this. The idea is that if we have e.g. a freeze_time: <date or timestamp> setting in a test, we'll wrap the execution of a query in the relevant context with the current time set to that value.
[x] Ensure unit test fixtures are unique to avoid issues when the test connection is used by multiple users concurrently.
[x] Add support for "dumping" duckdb state on test failure to speed up debugging

Slack Context (I have this conversation backed up if needed).

Summary: Add a DuckDB-specific mode that upon test failure we "dump" the state of the test to a *.db file that a user can then load to get the exact state of the test upon failure. Also requested that we create views that represent the CTEs of the model that failed so you could easily query up to a specific point in the CTE to avoid extra copy/pasting. This would be an argument that would be passed in and off by default.
[x] Allow users to override the warehouse which their unit tests run against

Some models can be tricky or even impossible to transpile correctly in order to run their queries in unit tests. In such cases, it can be helpful for users to specify a different warehouse to run their tests against (e.g. their default connection).
[ ] Allow unit tests to be defined in the CSV format, either inline or using fixture files.

Oct 30 '23 22:10 georgesittas

sqlmesh sqlmesh copied to clipboard

Various unit test improvements

sqlmesh
sqlmesh copied to clipboard