Parse failed rows as part of the package
Failed dbt test results return an error message that includes the number of failed rows. Today we load the message as is in the package, and parse the number in the CLI using Python. If we move the parsing to the package, users could run their analysis on these data.
Hello! Would it be also possible to return the number of total failed rows. For now, the error messages includes the number of singular different failed rows but not the total occurrences of rows that are failed. The goal is to track over time the percentage of failures through our tests. Thanks!
Copying from Slack for documentation:
The process is usually -
- forking the repo, in this case the changes are in the package so you'll need to fork this repo
- Changing the code
- Opening a PR
We can start by first extracting the failed results number from the message using a regex. We should save this failed_results as a new column failed_results in elementary_test_results. First we need to add this column here. This basically a macro that the elementary_test_results model uses to know what the schema of this table should be. Then we need to extract the failed results from the message in a macro called handle_test_results, around here. This macro responsible for storing all the test results in the elementary_test_results table. To extract it we need to run a simple regex that extracts a number from the message, the result should be saved in a new field in the dict here, called 'failed_results' as this is the same name of the column we created.
@oravi
I checked the result object as you said, and saw the 'failures' field:
message='Got 3100 results, configured to fail if != 0', failures=3100
message='Got 1 result, configured to fail if != 0', failures=1
Do you think it's better to use it?
Hi @aqwvinh, we think that this might be a quicker win than the regex. What do you say?
Hello! I will try to find some time to work on the PR this week but as a quick answer to your last message, it can be quicker indeed ! But what about if the test passes? Does it return failures=0?
Good question, it returns the number of failures. If the test has no special config and passes it returns 0. If the test is configured to fail on failures>100 and there are 5 failures, it will return 5 for a passed test. If there is an error in running the test - it returns None.
(I checked on dbt 1.0.0 and above)
Released on 0.4.12 of the dbt pakage! @aqwvinh