dbt-sqlserver icon indicating copy to clipboard operation
dbt-sqlserver copied to clipboard

Slow performance when materialization as table (`SELECT * INTO Model__dbt_tmp FROM Model__dbt_tmp_temp_view`)

Open romiof opened this issue 2 years ago • 5 comments

Hello!

This is a bug and a suggestion for improvement... I'm suffering with slow performance when materialization as table.

My environment

I started this month to use dbt for MS SQL. My database is a SQL Server 2014. Using the last version of dbt 1.4.3

My issue

I'm having problems whit materialization macro which make the tables at our SQL Server. Its performance is very, very slow.

More specific, my project consist in:

  • Using source yml files, to map my tables to dbt objects
  • Created views for stage layer, with this views querying my source tables. as suggested for dbt team here
  • Created my Dimensions and Fact entities as tables
  • Here, when I simply make a SELECT * on one model SQL, in the worst case, it takes less than 1 minute to return all lines.
  • But when using dbt run / dbt build it performs very poor.

What I investigated

  • In my case, the _Model__dbt_tmp_temp_view_ is a nested view, and this is the origin of my problem.
  • With a simple SELECT * FROM _Model__dbt_tmp_temp_view_ , everything works fine, the performance is equal when I select without the temp view.
  • But when using the INTO Model__dbt_tmp (https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/materializations/models/table/create_table_as.sql#LL23-L25) my SQL change its plan and it takes about 10 minutes to execute this materialization.
  • In last two days, I make a lot of tests, changed my JOINs, refactorated my SELECT but when I use nested views (because I'm following dbt best pratice) I got this slow performance. I'm not a expert to debug my execution plan, just I realized that these two queries make different plans ( SELECT * FROM _Model__dbt_tmp_temp_view_ vs SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_)
  • Maybe it is a glitch in our old SQL Server 2014

A possible workaround

I also discovered, that there is a hint which make SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_ runs with its original perfomance: OPTION (FORCE ORDER)

I founded it from this SO post.

So, if I run this query SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_ OPTION (FORCE ORDER) from my MS SQL Studio, it create my Model table in less than 1 minute.

Setting this option at dbt-sqlserver

I'd like to know, how can we change this macro to have a possibility to include this hint ? Maybe a configuration, like as_columnstore ?

romiof avatar May 29 '23 12:05 romiof

For what it’s worth, nested views perform horribly in SQL Server because of how the query planner works.

That being said, I really like the idea of supporting query hints as config in the macros.

matsonj avatar May 29 '23 15:05 matsonj

Came in here from google. We're experiencing the same issue, dreadfully slow performance in dbt, We're on sql server 2019 and dbt 1.4.6 (since dbt-sqlserver isn't 1.5 yet). adding option (force order) into the create_table_as macro unfortunately did not fix the issue though:

{% macro sqlserver__create_table_as(temporary, relation, sql) -%}
   {#- TODO: add contracts here when in dbt 1.5 -#}
   {%- set sql_header = config.get('sql_header', none) -%}
   {%- set as_columnstore = config.get('as_columnstore', default=true) -%}
   {%- set temp_view_sql = sql.replace("'", "''") -%}
   {%- set tmp_relation = relation.incorporate(
        path={"identifier": relation.identifier.replace("#", "") ~ '_temp_view'},
        type='view') -%}

   {{- sql_header if sql_header is not none -}}

    -- drop previous temp view
   {{- sqlserver__drop_relation_script(tmp_relation) }}

    -- create temp view
   USE [{{ relation.database }}];
   EXEC('create view {{ tmp_relation.include(database=False) }} as
    {{ temp_view_sql }}
    ');

   -- select into the table and create it that way
   {# TempDB schema is ignored, always goes to dbo #}
   SELECT *
   INTO {{ relation.include(database=False, schema=(not temporary))  }}
   FROM {{ tmp_relation }} option (force order)
   -- drop temp view
   {{ sqlserver__drop_relation_script(tmp_relation) }}

   {%- if not temporary and as_columnstore -%}
        -- add columnstore index
        {{ sqlserver__create_clustered_columnstore_index(relation) }}
   {%- endif -%}

{% endmacro %}

edvald-kvika avatar Jul 06 '23 15:07 edvald-kvika

Came in here from google. We're experiencing the same issue, dreadfully slow performance in dbt, We're on sql server 2019 and dbt 1.4.6 (since dbt-sqlserver isn't 1.5 yet). adding option (force order) into the create_table_as macro unfortunately did not fix the issue though:

@edvald-kvika

Here, I've experienced some models with slow performance without option (force order). About 10~15% of my models are in this situation.

For them, I created a config at model code, and had created a copy of macro sqlserver__create_table_as with an if statement to implement force order in case my config == true.

At my job, our SQL Server is not exclusive to our DW. I've about a dozen of other DBs used by other workloads (in general OLTP, but also have a couple of DBs for OLAP, where our users connect their Excel's at SQL tables and analyze data with Pivot Table).

So, I've moments during the workday, where dbt run hang with slows performance, but it's during resource concurrency in SQL Server.

In general, without bottlenecks in SQL Server, a dbt run model_abc is about 1.5x to 2x the time needed to SELECT * the model_abc SQL code.

romiof avatar Jul 11 '23 12:07 romiof

@romiof we just patched sqlserver__create_table_as in a larger dbt project (adding option (force order)) and model performance time greatly improved, from around 900 seconds down to 30.

edvald-kvika avatar Jul 14 '23 09:07 edvald-kvika

Now we have a bit more experience with option (force order). In some cases it drastically improves performance and in other cases is does the exact opposite. Looking at the query plans for the select into query showed that in those cases option (force order) put the optimizer into a heavy (and early) table spool operation. Our solution was to add a config parameter for force_order:

{% macro sqlserver__create_table_as(temporary, relation, sql) -%}
   {#- TODO: add contracts here when in dbt 1.5 -#}
   {%- set sql_header = config.get('sql_header', none) -%}
   {%- set as_columnstore = config.get('as_columnstore', default=true) -%}
   {%- set force_order = config.get('force_order', default=true) -%}
   {%- set temp_view_sql = sql.replace("'", "''") -%}
   {%- set tmp_relation = relation.incorporate(
        path={"identifier": relation.identifier.replace("#", "") ~ '_temp_view'},
        type='view') -%}

   {{- sql_header if sql_header is not none -}}

    -- drop previous temp view
   {{- sqlserver__drop_relation_script(tmp_relation) }}

    -- create temp view
   USE [{{ relation.database }}];
   EXEC('create view {{ tmp_relation.include(database=False) }} as
    {{ temp_view_sql }}
    ');

   -- select into the table and create it that way
   {# TempDB schema is ignored, always goes to dbo #}
   SELECT *
   INTO {{ relation.include(database=False, schema=(not temporary))  }}
   {%- if force_order %}
   -- add option (force order) to improve performance in nested views:
   -- https://github.com/dbt-msft/dbt-sqlserver/issues/410
   {%- endif %}
   FROM {{ tmp_relation }} {%- if force_order %} option (force order) {%- endif %}
   -- drop temp view
   {{ sqlserver__drop_relation_script(tmp_relation) }}

   {%- if not temporary and as_columnstore -%}
        -- add columnstore index
        {{ sqlserver__create_clustered_columnstore_index(relation) }}
   {%- endif -%}

{% endmacro %}

then in the table config we can turn it off where needed:

{{
  config(
    materialized = 'incremental',
    unique_key = ['account', 'level', 'month'],
    force_order = False
    )
}}

edvald-kvika avatar Jul 20 '23 16:07 edvald-kvika

Running into this now with some queries in the 130m row range.

Will try the macro overrides, but would be nice to see this merged into the main version.

ZachMassia avatar Jul 25 '25 20:07 ZachMassia

If anyone comes across this, the above code seems to have changed quite a bit.

I was able to find the macro to modify at https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/adapter/metadata.sql

I will still need to do some testing to see if things improve.

{% macro apply_label() %}
    {{ log (config.get('query_tag','dbt-sqlserver'))}}
    {%- set query_label = config.get('query_tag','dbt-sqlserver') -%}
    {%- set force_order = config.get('force_order', default=true) -%}
    
    {%- if force_order -%}
        -- Using FORCE ORDER as per https://github.com/dbt-msft/dbt-sqlserver/issues/410
        OPTION (LABEL = '{{query_label}}', FORCE ORDER);
    {%- else -%}
        OPTION (LABEL = '{{query_label}}');
    {%- endif -%}
{% endmacro %}

ZachMassia avatar Jul 28 '25 14:07 ZachMassia