Slow performance when materialization as table (`SELECT * INTO Model__dbt_tmp FROM Model__dbt_tmp_temp_view`)
Hello!
This is a bug and a suggestion for improvement... I'm suffering with slow performance when materialization as table.
My environment
I started this month to use dbt for MS SQL. My database is a SQL Server 2014. Using the last version of dbt 1.4.3
My issue
I'm having problems whit materialization macro which make the tables at our SQL Server. Its performance is very, very slow.
More specific, my project consist in:
- Using source
yml files, to map my tables to dbt objects - Created views for stage layer, with this views querying my source tables. as suggested for dbt team here
- Created my Dimensions and Fact entities as
tables - Here, when I simply make a SELECT * on one model SQL, in the worst case, it takes less than 1 minute to return all lines.
- But when using
dbt run / dbt buildit performs very poor.
What I investigated
- In my case, the
_Model__dbt_tmp_temp_view_is a nested view, and this is the origin of my problem. - With a simple
SELECT * FROM _Model__dbt_tmp_temp_view_, everything works fine, the performance is equal when I select without the temp view. - But when using the
INTO Model__dbt_tmp(https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/materializations/models/table/create_table_as.sql#LL23-L25) my SQL change its plan and it takes about 10 minutes to execute this materialization. - In last two days, I make a lot of tests, changed my JOINs, refactorated my SELECT but when I use nested views (because I'm following dbt best pratice) I got this slow performance. I'm not a expert to debug my execution plan, just I realized that these two queries make different plans (
SELECT * FROM _Model__dbt_tmp_temp_view_vsSELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_) - Maybe it is a glitch in our old SQL Server 2014
A possible workaround
I also discovered, that there is a hint which make SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_ runs with its original perfomance:
OPTION (FORCE ORDER)
I founded it from this SO post.
So, if I run this query SELECT * INTO Model__dbt_tmp FROM _Model__dbt_tmp_temp_view_ OPTION (FORCE ORDER) from my MS SQL Studio, it create my Model table in less than 1 minute.
Setting this option at dbt-sqlserver
I'd like to know, how can we change this macro to have a possibility to include this hint ?
Maybe a configuration, like as_columnstore ?
For what it’s worth, nested views perform horribly in SQL Server because of how the query planner works.
That being said, I really like the idea of supporting query hints as config in the macros.
Came in here from google. We're experiencing the same issue, dreadfully slow performance in dbt, We're on sql server 2019 and dbt 1.4.6 (since dbt-sqlserver isn't 1.5 yet). adding option (force order) into the create_table_as macro unfortunately did not fix the issue though:
{% macro sqlserver__create_table_as(temporary, relation, sql) -%}
{#- TODO: add contracts here when in dbt 1.5 -#}
{%- set sql_header = config.get('sql_header', none) -%}
{%- set as_columnstore = config.get('as_columnstore', default=true) -%}
{%- set temp_view_sql = sql.replace("'", "''") -%}
{%- set tmp_relation = relation.incorporate(
path={"identifier": relation.identifier.replace("#", "") ~ '_temp_view'},
type='view') -%}
{{- sql_header if sql_header is not none -}}
-- drop previous temp view
{{- sqlserver__drop_relation_script(tmp_relation) }}
-- create temp view
USE [{{ relation.database }}];
EXEC('create view {{ tmp_relation.include(database=False) }} as
{{ temp_view_sql }}
');
-- select into the table and create it that way
{# TempDB schema is ignored, always goes to dbo #}
SELECT *
INTO {{ relation.include(database=False, schema=(not temporary)) }}
FROM {{ tmp_relation }} option (force order)
-- drop temp view
{{ sqlserver__drop_relation_script(tmp_relation) }}
{%- if not temporary and as_columnstore -%}
-- add columnstore index
{{ sqlserver__create_clustered_columnstore_index(relation) }}
{%- endif -%}
{% endmacro %}
Came in here from google. We're experiencing the same issue, dreadfully slow performance in dbt, We're on sql server 2019 and dbt 1.4.6 (since dbt-sqlserver isn't 1.5 yet). adding
option (force order)into thecreate_table_asmacro unfortunately did not fix the issue though:
@edvald-kvika
Here, I've experienced some models with slow performance without option (force order).
About 10~15% of my models are in this situation.
For them, I created a config at model code, and had created a copy of macro sqlserver__create_table_as with an if statement to implement force order in case my config == true.
At my job, our SQL Server is not exclusive to our DW. I've about a dozen of other DBs used by other workloads (in general OLTP, but also have a couple of DBs for OLAP, where our users connect their Excel's at SQL tables and analyze data with Pivot Table).
So, I've moments during the workday, where dbt run hang with slows performance, but it's during resource concurrency in SQL Server.
In general, without bottlenecks in SQL Server, a dbt run model_abc is about 1.5x to 2x the time needed to SELECT * the model_abc SQL code.
@romiof we just patched sqlserver__create_table_as in a larger dbt project (adding option (force order)) and model performance time greatly improved, from around 900 seconds down to 30.
Now we have a bit more experience with option (force order). In some cases it drastically improves performance and in other cases is does the exact opposite. Looking at the query plans for the select into query showed that in those cases option (force order) put the optimizer into a heavy (and early) table spool operation. Our solution was to add a config parameter for force_order:
{% macro sqlserver__create_table_as(temporary, relation, sql) -%}
{#- TODO: add contracts here when in dbt 1.5 -#}
{%- set sql_header = config.get('sql_header', none) -%}
{%- set as_columnstore = config.get('as_columnstore', default=true) -%}
{%- set force_order = config.get('force_order', default=true) -%}
{%- set temp_view_sql = sql.replace("'", "''") -%}
{%- set tmp_relation = relation.incorporate(
path={"identifier": relation.identifier.replace("#", "") ~ '_temp_view'},
type='view') -%}
{{- sql_header if sql_header is not none -}}
-- drop previous temp view
{{- sqlserver__drop_relation_script(tmp_relation) }}
-- create temp view
USE [{{ relation.database }}];
EXEC('create view {{ tmp_relation.include(database=False) }} as
{{ temp_view_sql }}
');
-- select into the table and create it that way
{# TempDB schema is ignored, always goes to dbo #}
SELECT *
INTO {{ relation.include(database=False, schema=(not temporary)) }}
{%- if force_order %}
-- add option (force order) to improve performance in nested views:
-- https://github.com/dbt-msft/dbt-sqlserver/issues/410
{%- endif %}
FROM {{ tmp_relation }} {%- if force_order %} option (force order) {%- endif %}
-- drop temp view
{{ sqlserver__drop_relation_script(tmp_relation) }}
{%- if not temporary and as_columnstore -%}
-- add columnstore index
{{ sqlserver__create_clustered_columnstore_index(relation) }}
{%- endif -%}
{% endmacro %}
then in the table config we can turn it off where needed:
{{
config(
materialized = 'incremental',
unique_key = ['account', 'level', 'month'],
force_order = False
)
}}
Running into this now with some queries in the 130m row range.
Will try the macro overrides, but would be nice to see this merged into the main version.
If anyone comes across this, the above code seems to have changed quite a bit.
I was able to find the macro to modify at https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/adapter/metadata.sql
I will still need to do some testing to see if things improve.
{% macro apply_label() %}
{{ log (config.get('query_tag','dbt-sqlserver'))}}
{%- set query_label = config.get('query_tag','dbt-sqlserver') -%}
{%- set force_order = config.get('force_order', default=true) -%}
{%- if force_order -%}
-- Using FORCE ORDER as per https://github.com/dbt-msft/dbt-sqlserver/issues/410
OPTION (LABEL = '{{query_label}}', FORCE ORDER);
{%- else -%}
OPTION (LABEL = '{{query_label}}');
{%- endif -%}
{% endmacro %}