dbt-clickhouse icon indicating copy to clipboard operation
dbt-clickhouse copied to clipboard

Multiple materialized views on the same target distributed_table with a config possible?

Open rightx2 opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. When I defiend materialized_view like below,

{{ config(
       materialized='materialized_view',
       order_by=['stock_code', 'second'],
       sharding_key='cityHash64(stock_code)',
       engine='ReplicatedAggregatingMergeTree()',
       settings = { 'index_granularity': 8192, 'allow_nullable_key': 1 },
       ttl='toDateTime(second) + toIntervalHour(1)'
}}

SELECT
    stock_code,
    toStartOfSecond(__ts_ms) AS second,
    countState(*) AS pv
FROM {{ source("mydb", "stocks") }}
GROUP BY
    stock_code,
    second

I understood it created agg_pv table and agg_pv_mv view and I can query like below

SELECT stock_code, countMerge(pv) FROM analytics_dbt_dev.agg_pv GROUP BY 1 ORDER BY 2 desc

but I'd like to do more:

  1. I'd like to create target materialized table as distributed table. If it worked, agg_pv, agg_pv_local would be created. Related issue is https://github.com/ClickHouse/dbt-clickhouse/issues/302 and it closed as completed, but I can't fully understand how the answer work:
{{ config(materialized='view') }}

SELECT *
FROM clusterAllReplicas('cluster_name', {{ ref('local_table') }}) SETTINGS final = 1

Moreover, official docs said, it's kinda inefficient..

Using the cluster and clusterAllReplicas table functions are less efficient than creating a Distributed table because in this case, the server connection is re-established for every request. When processing a large number of queries, please always create the Distributed table ahead of time, and do not use the cluster and clusterAllReplicas table functions.

I wonder whether I can implement it with one model sql file with config first, but if can't, I'd like to know any other ways as well.

  1. I'd like to set two materialized views, from differenet source, and targeting on the same table like below:
A --> target_t_mv --> target_t (distributed)
B --> target_t_mv --> target_t (distributed)

I wonder whether I can do this with config. If can't, I'd like to know any other ways as well.

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

rightx2 avatar Jul 17 '24 01:07 rightx2

AFAIK there isn't a way to do this in dbt-clickhouse yet -- I was thinking that one solution could be to add some kind of flag in the config like create_target_table, so that only one of the MVs will create the target table

Other relevant issue: https://github.com/ClickHouse/dbt-clickhouse/issues/280

benjamin-awd avatar Jul 26 '24 08:07 benjamin-awd

I was exploring this today, I've come up with a proposal which I think is fairly easy to implement. I will hopefully make a PR tomorrow to demonstrate it clearly. I would appreciate your thoughts on this idea. #280

the4thamigo-uk avatar Oct 07 '24 21:10 the4thamigo-uk

@BentsiLeviav can we close this? or do we need to raise a follow up issue?

laeg avatar Mar 06 '25 12:03 laeg

Thanks to @the4thamigo-uk, we released this feature in release 1.8.5.

Closing this, if you have any questions/concerts, feel free to reopen this.

BentsiLeviav avatar Mar 06 '25 12:03 BentsiLeviav