elementary icon indicating copy to clipboard operation
elementary copied to clipboard

Allow for non-integer values to be returned in the `should_error` and `should_warn` fields of generic test SQL; ie. do not store `failures` as an `integer` or enforce integer data types in this schema

Open garfieldthesam opened this issue 11 months ago • 1 comments

Is your feature request related to a problem? Please describe.

One can define a custom fail_calc for their generic test. One can define a calculation that returns a non-integer value. For example, I modified the dbt_utils.not_null_proportion test to use this config: {{ config(fail_calc = 'coalesce(not_null_proportion, 0)') }}

This allows me to configure warn and error thresholds like so:

      - name: test_name
        test_name: dbt_utils.not_null_proportion
        column_name: "some_column"
        where: some_where_clause
        config:
          severity: warn
          warn_if: <= 0.98
          error_if: <= 0.95

Now compiled test SQL looks like this in Databricks:

select
  coalesce(not_null_proportion, 0) as failures,
  coalesce(not_null_proportion, 0) <= 0.98 as should_warn,
  coalesce(not_null_proportion, 0) <= 0.95 as should_error
from
  (
    with validation as (
      select
        sum(
          case
            when some_column is null then 0
            else 1
          end
        ) / cast(count(*) as numeric) as not_null_proportion
      from
        some_table
    ),
    validation_errors as (
      select
        not_null_proportion
      from
        validation
      where
        not_null_proportion < 1.00000000001
        or not_null_proportion > 1
    )
    select
      *
    from
      validation_errors
  ) dbt_internal_test

This would work, except Elementary throws this error in the on-run-end hook:

0.98657576579 is not of type 'integer'

Failed validating 'type' in schema['properties']['failures']:
    {'type': 'integer'}

On instance['failures']:
    0.98657576579

Describe the solution you'd like

Allow for non-integer values in the failures field when running an on-run-end-hook

Describe alternatives you've considered

Unfortunately I have to not use the preferred fail_calc for several of my tests

Would you be willing to contribute this feature?

Possibly, but not any time soon

garfieldthesam avatar Mar 09 '24 00:03 garfieldthesam

Hi @garfieldthesam !

Thanks for opening this issue and sorry for the late response. Very interesting use case.

However, are you sure that this error is actually coming from Elementary? From what I can see - dbt's base result class requires the failures value to be an integer - which I believe is the cause of this error.

haritamar avatar May 20 '24 13:05 haritamar