dbt-utils icon indicating copy to clipboard operation
dbt-utils copied to clipboard

[Bug] dbt run `--empty` tag doesn't work with `dbt_utils.get_column_values()`

Open Ananz2022 opened this issue 10 months ago • 9 comments

Is this a new bug in dbt-core?

  • [x] I believe this is a new bug in dbt-core
  • [x] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

We use --empty to create empty tables. If there is no record in the table, dbt_utils.get_column_values() fails with a NoneType' object is not iterable error as it does not found values to loop through.

similar issue found with

  • dbt_utils.pivot()
  • dbt_utils.get_query_results_as_dict()

Expected Behavior

Function to create some dummy records to make it pass.

  • dbt_utils.get_column_values()
  • dbt_utils.pivot()
  • dbt_utils.get_query_results_as_dict()

Steps To Reproduce

Use dbt_utils macros for empty tables

dbt_utils.get_query_results_as_dict(SELECT * FROM empty_table)

Relevant log output


Environment

dbt version: 1.9.2
python version: 3.11.1
adapter type: snowflake
adapter version: 1.9.1

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

Ananz2022 avatar Mar 07 '25 21:03 Ananz2022

Thanks for reaching out @Ananz2022 !

Did you try using the default parameter of dbt_utils.get_column_values, etc?

When I followed the reprex below, I was able to get the same error as you. ❌ But once I added , default=[] within the call to dbt_utils.get_column_values, then the error went away. ✅

Reprex

Create these files:

models/a.sql

select 1 as payment_method

models/b.sql


-- Returns a list of the payment_methods in the `a` model
{% set payment_methods = dbt_utils.get_column_values(table=ref('a'), column='payment_method', default=[]) %}

{{ log("payment_methods: " ~ payment_methods, True) }}

select 0 as id

{% for payment_method in payment_methods %}
    , 1 as {{ payment_method }}_count
{% endfor %}

Run this command:

dbt build -s +b --empty

See this error:

Compilation Error in model b (models/b.sql)
  'NoneType' object is not iterable

Then add the default parameter like this:

{% set payment_methods = dbt_utils.get_column_values(table=ref('a'), column='payment_method', default=[]) %}

Then re-run the original command and see that everything works.

dbeatty10 avatar Mar 25 '25 21:03 dbeatty10

Hi @dbeatty10 , this works perfectly! Thanks so much!

I just wanted to follow up on the other dbt utils macros, such as:

  • dbt_utils.pivot()
  • dbt_utils.get_query_results_as_dict()
  • dbt_utils.date_spine()
  • and others.

Do you have any suggestions on how these interact with --Empty?

We’re using the Empty tag for CI validation purposes. Is there a way to set default values for these so our CI can pass successfully? Or, is there a way for --Empty to bypass the macros that require values to guarantee that the CI passes?

Ananz2022 avatar Mar 27 '25 13:03 Ananz2022

You're welcome @Ananz2022 !

We’re using the Empty tag for CI validation purposes. Is there a way to set default values for these so our CI can pass successfully? Or, is there a way for --Empty to bypass the macros that require values to guarantee that the CI passes?

I think it will depend on the macro and what you are doing with the output.

Doing some simple examples of each of those macros worked for me along with the --empty flag. See below for details.

In order to reproduce the issues you are seeing, can you give a simple examples of each of the macros that don't work the way you expect with the --empty flag?

Examples

Create these files:

seeds/c.csv

size,color
S,red
S,blue
S,red
M,red

models/a.sql

select 'visa' as payment_method

models/dbt_utils__date_spine.sql

{{ dbt_utils.date_spine(
    datepart="day",
    start_date="cast('2020-01-01' as date)",
    end_date="cast('2020-01-10' as date)"
   )
}}

models/dbt_utils__get_column_values.sql


-- Returns a list of the payment_methods in the `a` model
{% set payment_methods = dbt_utils.get_column_values(table=ref('a'), column='payment_method', default=[]) %}

select 0 as id

{% for payment_method in payment_methods %}
    , 1 as {{ payment_method }}_count
{% endfor %}

models/dbt_utils__get_query_results_as_dict.sql

{% set sql_statement %}
    select size, color from {{ ref('c') }}
{% endset %}

{%- set places = dbt_utils.get_query_results_as_dict(sql_statement) -%}

{{ log("places: " ~ places, True) }}

{% if target.type == "snowflake" %}
  {%- set sizes = places['SIZE'] -%}
  {%- set colors = places['COLOR'] -%}
{% else %}
  {%- set sizes = places['size'] -%}
  {%- set colors = places['color'] -%}
{% endif %}

select

    {% for size in sizes | unique -%}
      sum(case when size = '{{ size }}' then 1 else 0 end) as users_in_{{ dbt_utils.slugify(size) }},
    {% endfor %}

    {% for color in colors | unique -%}
      sum(case when color = '{{ color }}' then 1 else 0 end) as users_in_{{ dbt_utils.slugify(color) }},
    {% endfor %}

    count(*) as total_total

from {{ ref('c') }}

models/dbt_utils__pivot.sql

select
    size,
    {{ dbt_utils.pivot(
        'color',
        dbt_utils.get_column_values(ref('c'), 'color', default=[])
    ) }}
from {{ ref('c') }}
group by size

Run these commands:

Get this output:

$ dbt build --empty

01:19:44  Running with dbt=1.9.3
01:19:44  Registered adapter: duckdb=1.9.2
01:19:44  Found 5 models, 1 seed, 540 macros
01:19:44  
01:19:44  Concurrency: 1 threads (target='duckdb')
01:19:44  
01:19:45  1 of 6 START sql view model dbt_triage.a ....................................... [RUN]
01:19:45  1 of 6 OK created sql view model dbt_triage.a .................................. [OK in 0.03s]
01:19:45  2 of 6 START sql view model dbt_triage.dbt_utils__date_spine ................... [RUN]
01:19:45  2 of 6 OK created sql view model dbt_triage.dbt_utils__date_spine .............. [OK in 0.02s]
01:19:45  3 of 6 START seed file dbt_triage.c ............................................ [RUN]
01:19:45  3 of 6 OK loaded seed file dbt_triage.c ........................................ [INSERT 4 in 0.03s]
01:19:45  4 of 6 START sql view model dbt_triage.dbt_utils__get_column_values ............ [RUN]
01:19:45  4 of 6 OK created sql view model dbt_triage.dbt_utils__get_column_values ....... [OK in 0.01s]
01:19:45  5 of 6 START sql view model dbt_triage.dbt_utils__get_query_results_as_dict .... [RUN]
01:19:45  5 of 6 OK created sql view model dbt_triage.dbt_utils__get_query_results_as_dict  [OK in 0.02s]
01:19:45  6 of 6 START sql view model dbt_triage.dbt_utils__pivot ........................ [RUN]
01:19:45  6 of 6 OK created sql view model dbt_triage.dbt_utils__pivot ................... [OK in 0.02s]
01:19:45  
01:19:45  Finished running 1 seed, 5 view models in 0 hours 0 minutes and 0.26 seconds (0.26s).
01:19:45  
01:19:45  Completed successfully
01:19:45  
01:19:45  Done. PASS=6 WARN=0 ERROR=0 SKIP=0 TOTAL=6

dbeatty10 avatar Mar 28 '25 01:03 dbeatty10

@dbeatty10 Thank you! Below are examples of each macro I mentioned above.

  1. dbt_utils.date_spine

seeds/test_a.csv

etl_extract_timestamp
2025-01-03 15:02:02.505 +0000
2025-01-31 15:02:00.048 +0000
2025-02-07 15:01:35.069 +0000
2024-11-27 14:27:49.342 +0000
2025-03-21 15:01:29.998 +0000

models/test_date_spine.sql

{% set start_date = "(select date_trunc('week', min(TO_TIMESTAMP(etl_extract_timestamp, 'YYYY-MM-DD HH24:MI:SS'))) from " + ref('test_a') | string + ")" %}

{% set end_date = "(select dateadd('week', 1, date_trunc('week', max(TO_TIMESTAMP(etl_extract_timestamp, 'YYYY-MM-DD HH24:MI:SS')))) from " + ref('test_a')| string + ")" %}

with date_spine as
(
    {{ dbt_utils.date_spine(
        datepart="week",
        start_date=start_date,
        end_date=end_date
    ) }}
)
select * from date_spine

Run command dbt run --select test_date_spine --empty Error we see

  '<=' not supported between instances of 'NoneType' and 'int'
  
  > in macro get_powers_of_two (macros/sql/generate_series.sql)
  > called by macro default__generate_series (macros/sql/generate_series.sql)
  > called by macro generate_series (macros/sql/generate_series.sql)
  > called by macro default__date_spine (macros/sql/date_spine.sql)
  > called by macro date_spine (macros/sql/date_spine.sql)
  > called by model test_date_spine (models/test_date_spine.sql)
  1. dbt_utils.pivot

seeds/test_a.csv

kpi
A
B
B
 

models/test_pivot.sql


 select
  {{ dbt_utils.pivot(
     'kpi', 
     dbt_utils.get_column_values(table=ref('test_a'), column='kpi',default=[]),
  ) }}
 from {{ ref('test_a') }}

Run command dbt run --select test_pivot--> It works!

Run command dbt run --select test_pivot --empty

Error we see


   'str object' has no attribute 'database'
   
   > in macro load_relation (macros/adapters/relation.sql)
   > called by macro default__get_column_values (macros/sql/get_column_values.sql)
   > called by macro get_column_values (macros/sql/get_column_values.sql)
   > called by model test_dbt_utils (models/test_dbt_utils.sql)

  1. dbt_utils.get_query_results_as_dict

seeds/test_a.csv

kpi,category,sign
A,C_1,+
B,C_2,-
B,C_3,+

models/test_query_result.sql


{% set kpi_mapping_sql %}
    SELECT * FROM {{ ref("test_a") }}
{% endset %}

{%- set kpis = dbt_utils.get_query_results_as_dict(kpi_mapping_sql) -%}

 {% for kpi in kpis["KPI"] %}

    {% set category = kpis["CATEGORY"][loop.index0] %}
    {% set sign = kpis["SIGN"][loop.index0] %}
    select '{{ kpi }}' as kpi, '{{ category }}' as category, '{{ sign }}' as sign
    {{ "UNION ALL" if not loop.last }}

{% endfor %}

Run command dbt run --select test_query_result--> It works!

Run command dbt run --select test_query_result --empty

Error we see

  syntax error line 5 at position 2 unexpected ')'.

Ananz2022 avatar Mar 28 '25 18:03 Ananz2022

Thanks for these examples @Ananz2022 !

These are all in the territory of not-quite-a-feature-request and not-quite-a-bug. The best description is "limitation" of the --empty flag.

Namely, --empty works by wrapping any refs and sources within a subquery to limit the results to zero rows.

In many cases, dbt code can handle getting no rows back well and is able to build out an empty table with the correct column names and schemas.

But once there's introspective steps in the mix, all bets are off -- receiving 0 rows might fundamentally fly in the face of the assumptions the code is built on. In those cases, then the --empty flag won't work.

In some cases, there might be an easy work around (i.e. default for dbt_utils.get_column_values). In other cases, it might be quite a bit more complicated (and maybe even impossible).

A "complicated" example might be the addition of the following logic to handle the case of 0 rows for your example of dbt_utils.get_query_results_as_dict:

{%- set kpis = dbt_utils.get_query_results_as_dict(kpi_mapping_sql) -%}

{%- if kpis["kpi"] | length == 0 -%}

    select 'x' as kpi, 'x' as category, 'x' as sign
    where 1=0

{% endif %}

I took a quick shot at each of your dbt_utils.date_spine and dbt_utils.pivot examples and came up short. So not sure if someone more clever than me can figure out workarounds for those cases. (If so, please share in this thread!).

Potential workarounds

You might find some success with adding tags to the models that can't be executed via --empty and then using defer their references to an environment that already has them built (by excluding those tags from the selection).

Assuming you have an artifacts directory with references to those models that have already been built, it might look like this (winging it, completely untested):

models:
  - name: test_query_result
    config:
      tags: ["no_empty"]
dbt parse --target target_with_my_fully_built_tables --target-path artifacts
dbt build --defer --state artifacts --exclude "tag:no_empty" --empty

Summary

I don't see us being able to do anything about these limitations of --empty anytime soon other than add an update to our documentation: https://github.com/dbt-labs/docs.getdbt.com/issues/7113

dbeatty10 avatar Mar 28 '25 23:03 dbeatty10

Thanks for the explanation @dbeatty10 .

Yes, I agree that there are edge cases where --empty is not designed to handle. Since the examples I provided came from different developers on my team, each with their own approach, it wouldn’t be practical to have them all go back and modify their logic to account for these edge cases.

Regarding workarounds:

  1. I agree using a tag and excluding those edge cases from CI is a good approach.

  2. I’ve also experimented with some override macros and used dispatch to prioritize them. They address some of the edge cases, though not all. I’m happy to share them here in case anyone finds them useful.

  3. I’m looking forward to the --sample feature in your next release. It should be helpful in addressing some of these edge cases that require small sample data.

The override I created

In dbt_project.yml

dispatch:
  - macro_namespace: dbt_utils
    search_order: ['dbt_xxx', 'dbt_utils']

macros/overriden_builtins/pivot.sql

{% macro snowflake__pivot(column,
               values,
               alias=True,
               agg='sum',
               cmp='=',
               prefix='',
               suffix='',
               then_value=1,
               else_value=0,
               quote_identifiers=True,
               distinct=False) %}

  {# Check if values are empty or None #}
  {% if values is none or values == [] %}
    {# Log that pivoting will be done with no values #}
    {{ log('Pivot: No values to pivot. Using default zero values for all columns.', info=True) }}

    {# Default behavior: Returning case when structure for empty pivot columns #}
    {% set empty_column = prefix ~ 'empty' ~ suffix %}

    {# This generates a column with default values for the empty table case #}
    {{ agg }}(
      {% if distinct %} distinct {% endif %}
      case 
        when {{ column }} is not null then {{ then_value }}
        else {{ else_value }}
      end
    ) as {{ adapter.quote(empty_column) }}

  {% else %}
    {# Regular pivot behavior with provided values #}
    {% for value in values %}
      {{ agg }}(
        {% if distinct %} distinct {% endif %}
        case
        when {{ column }} {{ cmp }} '{{ dbt.escape_single_quotes(value) }}'
          then {{ then_value }}
        else {{ else_value }}
      end
      )
      {% if alias %}
        {% if quote_identifiers %}
              as {{ adapter.quote(prefix ~ value ~ suffix) }}
        {% else %}
          as {{ dbt_utils.slugify(prefix ~ value ~ suffix) }}
        {% endif %}
      {% endif %}
      {% if not loop.last %},{% endif %}
    {% endfor %}
  {% endif %}

{% endmacro %}

macros/overriden_builtins/get_query_results_as_dict.sql


{% macro snowflake__get_query_results_as_dict(query) %}
    {%- call statement('get_query_results', fetch_result=True, auto_begin=false) -%}
        {{ query }}
    {%- endcall -%}

    {% set sql_results = {} %}

    {%- if execute -%}
        {% set sql_results_table = load_result('get_query_results').table.columns %}

        {% for column_name, column in sql_results_table.items() %}
            {% if column.values() %}
                {% do sql_results.update({column_name: column.values()}) %}
            {% else %}
                {# If column is empty, assign a dummy value#}
                {% do sql_results.update({column_name: ' '}) %}
                {{ log('Column ' ~ column_name ~ ' has no values. Assigned empty value', info=True) }}
            {% endif %}

        {% endfor %}
    {%- endif -%}

    {{ return(sql_results) }}
{% endmacro %}

Ananz2022 avatar Mar 30 '25 15:03 Ananz2022

These are awesome @Ananz2022 ! 🤩

Would you be interested in opening up PRs in the dbt_utils repo for either (or both) of these?

The diffs seem pretty straight-forward, and then everyone could utilize them.

Here's the essence of the diffs that I saw:

diff --git a/macros/sql/get_query_results_as_dict.sql b/macros/sql/get_query_results_as_dict.sql
index 6548f2d..e64ef11 100644
--- a/macros/sql/get_query_results_as_dict.sql
+++ b/macros/sql/get_query_results_as_dict.sql
@@ -16,8 +16,16 @@
 
     {%- if execute -%}
         {% set sql_results_table = load_result('get_query_results').table.columns %}
         {% for column_name, column in sql_results_table.items() %}
+            {% if column.values() %}
             {% do sql_results.update({column_name: column.values()}) %}
+            {% else %}
+                {# If column is empty, assign a dummy value#}
+                {% do sql_results.update({column_name: ' '}) %}
+                {{ log('Column ' ~ column_name ~ ' has no values. Assigned empty value', info=True) }}
+            {% endif %}
         {% endfor %}
     {%- endif -%}
diff --git a/macros/sql/pivot.sql b/macros/sql/pivot.sql
index 3eabc72..551b9da 100644
--- a/macros/sql/pivot.sql
+++ b/macros/sql/pivot.sql
@@ -65,6 +65,26 @@ Arguments:
                else_value=0,
                quote_identifiers=True,
                distinct=False) %}
+  {# Check if values are empty or None #}
+  {% if values is none or values == [] %}
+    {# Log that pivoting will be done with no values #}
+    {{ log('Pivot: No values to pivot. Using default zero values for all columns.', info=True) }}
+
+    {# Default behavior: Returning case when structure for empty pivot columns #}
+    {% set empty_column = prefix ~ 'empty' ~ suffix %}
+
+    {# This generates a column with default values for the empty table case #}
+    {{ agg }}(
+      {% if distinct %} distinct {% endif %}
+      case
+        when {{ column }} is not null then {{ then_value }}
+        else {{ else_value }}
+      end
+    ) as {{ adapter.quote(empty_column) }}
+
+  {% else %}
+  {# Regular pivot behavior with provided values #}
   {% for value in values %}
     {{ agg }}(
       {% if distinct %} distinct {% endif %}
@@ -83,4 +103,5 @@ Arguments:
     {% endif %}
     {% if not loop.last %},{% endif %}
   {% endfor %}
+  {% endif %}
 {% endmacro %}

dbeatty10 avatar Mar 30 '25 18:03 dbeatty10

@dbeatty10 Very excited excited to see your positive feedback! Here is the pull request I opened.

Thank you!

Ananz2022 avatar Mar 31 '25 23:03 Ananz2022

Transferred from https://github.com/dbt-labs/dbt-core/issues/11375

dbeatty10 avatar Apr 03 '25 23:04 dbeatty10

I've stumbled on the same problem, but with dbt_utils.star. However, the "default" parameter doesn't seem to fit. Is there another workaround?

rodrigoccurvo avatar Dec 01 '25 18:12 rodrigoccurvo