Elementary test tries to cast `metric_value` to `numeric(28,6)` which fails on our larger datatypes

Open jan-benisek opened this issue 11 months ago • 0 comments

Describe the bug When running Elementary tests on a model

dbt test -s foobar --vars "{'elementary_enabled': true}" -t prod

We get an error:

Numeric data overflow (scale float to decimal)

To Reproduce When running the dbt command above, I tried to localise which queries cause it. I saw in dbt.log in this query:

[0m09:56:43.769788 [debug] [Thread-2 (]: On test.bigcorp.elementary_all_columns_anomalies_foobar_.656a1159e6: /* {"app": "dbt", "dbt_version": "1.8.6", "profile_name": "bigcorp", "target_name": "prod", "node_id": "test.bigcorp.elementary_all_columns_anomalies_foobar_.656a1159e6"} */

    with anomaly_scores as (
        select
            id,
...

when column_name is not null then 
    'In column ' || column_name || ', the last ' || metric_name || ' value is ' || cast(round(cast(metric_value as numeric(28,6)), 3) as varchar(4096)) ||
    '. The average for this metric is ' || cast(round(cast(training_avg as numeric(28,6)), 3) as varchar(4096)) || '.'

        else null
    end as anomaly_description
...

This is the problematic part: cast(round(cast(metric_value as numeric(28,6)), 3).

I saw that Elementary runs some inserts and in one of the subqueries, it runs variance(cast("custom_int" as float)).

 insert into "prod"."elementary"."test_656a1159e6_elementary_all_columns_anomalies_foobar___metrics__tmp_20250110085236933131"
        with tmp_table as (

...

column_monitors as (select
                        edr_bucket_start as bucket_start,
                        edr_bucket_end as bucket_end,

    datediff(
        hour,
        edr_bucket_start,
        edr_bucket_end
        )
 as bucket_duration_hours,
                    cast('custom_int' as varchar(4096)) as edr_column_name,

    coalesce(sum(case when "custom_int" is null then 1 else 0 end), 0)
 as null_count,

      round(cast(cast(
    coalesce(sum(case when "custom_int" is null then 1 else 0 end), 0)
 as float) / nullif(cast(count(*) as float), 0) * 100.0 as numeric(28,6)), 3)

 as null_percent,null  as not_null_percent,max(cast("custom_int" as float)) as max,min(cast("custom_int" as float)) as min,avg(cast("custom_int" as float)) as average,
    coalesce(sum(case when "custom_int" is null then 1 when cast("custom_int" as float) = 0 then 1 else 0 end), 0)
 as zero_count,

      round(cast(cast(
    coalesce(sum(case when "custom_int" is null then 1 when cast("custom_int" as float) = 0 then 1 else 0 end), 0)
 as float) / nullif(cast(count(*) as float), 0) * 100.0 as numeric(28,6)), 3)

 as zero_percent,null  as not_zero_percent,stddev(cast("custom_int" as float)) as standard_deviation,variance(cast("custom_int" as float)) as variance,null  as max_length,null  as min_length,null  as average_length,null  as missing_count,null  as missing_percent,null  as count_true,null  as count_false,null  as not_missing_percent,null  as sum
                from filtered_monitored_table
                    left join buckets on (edr_bucket_start = start_bucket_in_data)

                    group by 1,2,3,4

That variance, in our case, yields very a large number:

select variance(cast("custom_int" as float)) from production.foobart where ts >= '2024-12-31' and ts <= '2025-01-10';
-- 3636013502843636000000

Which cannot be cast to numeric(28,6)) later.

Expected behavior It should work with larger data types.

Screenshots N/A

Environment (please complete the following information):

Elementary CLI (edr) version: 0.15.1
Elementary dbt package version: 0.16.0
dbt version you're using: 1.8.6
Data warehouse: Redshift
Infrastructure details: NA

Additional context NA

Would you be willing to contribute a fix for this issue? Unlikely, unless it is easy?

Jan 10 '25 12:01 jan-benisek