dbt-clickhouse icon indicating copy to clipboard operation
dbt-clickhouse copied to clipboard

query_settings not being correctly read

Open adrevant opened this issue 1 year ago • 3 comments

Describe the bug

I'm trying to build an incremental table that is basically built by using various joins with other tables. My current approach uses a lot of memory, but by enabling the following options the amount of memory significantly decreases:

SET join_algorithm = 'full_sorting_merge';
SET optimize_read_in_order = 1;
SET optimize_sorting_by_input_stream_properties = 1;

So, I tried to use the same config in my dbt incremental model by enabling those params using the query_settings dictionary but it fails to read the settings for the INSERT INTO query.

Code examples, such as models or profile settings

  • model settings:
{{
    config(
        materialized = "incremental",
        engine = "ReplicatedMergeTree",
        unique_key = "transaction_id",
        order_by = '(txn_time_q, txn_result, txn_type, shop_id, transaction_id)',
        partition_by = 'txn_time_year',
        incremental_strategy = 'delete+insert',
        settings = {'allow_nullable_key': 1,},
       query_settings = {"join_algorithm": "full_sorting_merge",  "optimize_read_in_order": 1,
                          "optimize_sorting_by_input_stream_properties": 1,},
    )
}}
  • code being ran by dbt:
insert into warehouse_dev.transactions_test
        ("transaction_id", "last_mod", "shop_id", "transaction_added", "txn_time_year", "txn_time_q", "txn_result", "txn_type", "email")


WITH transactions AS (
    SELECT
        transaction_id,
        shop_id,
        last_mod,
        transaction_added,
        txn_result,
        txn_type
    FROM warehouse_dev.stg_transactions
        WHERE
            last_mod > (
                SELECT MAX(last_mod) FROM warehouse_dev.transactions
                WHERE
                    txn_time_year
                    >= date_trunc(
                        'year', now('America/Mexico_City') - INTERVAL '365 day'
                    )
            )
),

event AS (
    SELECT
        transaction_id,
        email
    FROM warehouse_dev.stg_transaction_event_card
        WHERE
        	last_mod > (
                SELECT MAX(last_mod) FROM warehouse_dev.transactions
                WHERE
                    txn_time_year
                    >= date_trunc(
                        'year', now('America/Mexico_City') - INTERVAL '365 day'
                    )
            )
),

shops AS (
    SELECT
        shop_id,
        company
    FROM warehouse_dev.stg_shops
),

transaction_join AS (
    SELECT
        t.transaction_id AS transaction_id,
        t.last_mod AS last_mod,
        t.shop_id AS shop_id,
        t.transaction_added AS transaction_added,
        e.email AS email,
        s.company AS company,
        t.txn_result AS txn_result,
        t.txn_type AS txn_type
    FROM transactions AS t
    LEFT JOIN event AS e ON e.transaction_id = t.transaction_id
    LEFT JOIN SHOP AS s ON s.shop_id = t.shop_id
)

SELECT
    t1.transaction_id AS transaction_id,
    t1.last_mod AS last_mod,
    t1.shop_id AS shop_id,
    t1.transaction_added AS transaction_added,
    date_trunc('year', t1.transaction_added) AS txn_time_year,
    date_trunc('quarter', t1.transaction_added) AS txn_time_q,
    t1.email AS email,
    t1.company AS company,
    t1.txn_result AS txn_result,
    t1.txn_type AS txn_type
FROM transaction_join AS t1

-- limit 1000

SETTINGS  join_algorithm=full_sorting_merge,  optimize_read_in_order=1,  optimize_sorting_by_input_stream_properties=1

dbt and/or ClickHouse server logs

Code: 62. DB::Exception: Syntax error: failed at position M (',') (line N, col N1): ,  optimize_read_in_order=1,  optimize_sorting_by_input_stream_properties=1. Expected one of: token, Dot, OpeningRoundBracket, end of query. (SYNTAX_ERROR) (version 23.11.2.11 (official build))

Configuration

Environment

  • dbt version: 1.6.9
  • dbt-clickhouse version: 1.6.2
  • clickhouse-driver version (if using native): 0.2.6
  • clickhouse-connect version (if using http): 0.6.23
  • Python version: 3.11.6
  • Operating system: linux/amd64

adrevant avatar Jan 16 '24 22:01 adrevant

Running the exact same query on dbaver but with 'full_sorting_merge' quoted actually worked:

...
SETTINGS  join_algorithm='full_sorting_merge',  optimize_read_in_order=1,  optimize_sorting_by_input_stream_properties=1

adrevant avatar Jan 16 '24 22:01 adrevant

I think there's an issue with rendering settings or query_settings that are string values. You may need to add single quotes in your configuration so that the macro renders them correctly (with single quotes for non-numeric values), so in your example, you could try something like:

query_settings = {"join_algorithm": "'full_sorting_merge'",  "optimize_read_in_order": 1,
                          "optimize_sorting_by_input_stream_properties": 1,},

The macros should really detect and render the strings correctly quoted, so we'll get a fix for that in an upcoming release.

genzgd avatar Jan 16 '24 23:01 genzgd

Thank you! Your suggestion worked just fine. Looking forward for the next release!

adrevant avatar Jan 17 '24 13:01 adrevant