soda-sql
soda-sql copied to clipboard
Store scan metadata to warehouse/cloud/scan result: `soda scan --store-metadata`
Is your feature request related to a problem? Please describe. Store scan metadata in the warehouse. Like which queries/tests/metrics are executed and how long they took.
I would like to use this table to monitor Soda scans. (Yes, very meta, monitoring the monitor system.) We could provide a default scan definition that goes with this table. The use case I for see:
- Get insights on how long certain tests take.
- Get warnings if certain tests execute a lot longer than they did before.
- Get insights about which tests fail most
- Get warnings if there is an increase in test failures.
Describe the solution you'd like
When the user adds the flag --store-metadata
(or something similar) to a soda scan, then we store the metadata of that particular run in a table in the database. The table would look something like:
run_timestamp | dataset | test name | execution time | result |
---|---|---|---|---|
2021-26-10T15:55:50 | my_table | row count == 0 | 00:00:10 | FAILED |
Additional context
- See dbt's `store_failures
Actually, it does not necessarily have to be stored as a table in the warehouse. Other options are:
- Send the stats to the Soda cloud (when a flag is given by a user) and show it as a dataset. Maybe include default monitors, like execution time increasing a lot (+100% vs last run)
- Add it to the run results
btw, these options are not exclusive. I like the first one, because it is user friendly, it allows user to very quickly start monitoring the Soda scan. The second one is more flexible, it allows developers to push this information anywhere they like.
Also, I think these options are more easily implemented than my first suggestion, as that requires new functionality for all packages/warehouse (the option to create a table within the warehouse), where the two options mentioned in this comment are warehouse independent.
part of this is handled by sodadata/soda-core#543 which adds open telemetry to the scans.
@vijaykiran how do we proceed? Maybe describe what sodadata/soda-core#543 does exactly, which metrics it yields, and adapt this issue accordingly?
@fakirAyoub good question, I've updated sodadata/soda-core#543 to add a bit more info on how things are derived now that we're entering implementation phase which should help answering your question and @JCZuurmond 's a bit better