dbt-clickhouse
dbt-clickhouse copied to clipboard
Add backfill functionality to Materialized Views
I have been using a materialized view
as a materialization privately for a few months and would like to share it with the community. However, it is a little bit funky because of the complication for backfill so I just wanted to get @genzgd's opinion before making a PR to see if it's worthwhile?
Below is the skeleton of what the materialization does
Suppose the dbt model matview.sql
is
{{ config(..) }}
select a, sum(b) from {{ ref('some_table') }} where inserted_at > _BUILD_TIME group by a
Then the materialised view materialisation does the following
- Create table
matview_tmp
- destination table that holds historic data. - Create table
matview
- destination table that holds new incoming data for mv. - set
_BUILD_TIME
to 3secs in future - create or replace mat view
matview_mv
that contains the sql trigger which sends data tomatview
for rows inserted insome_table
after_BUILD_TIME
- Insert into
matview_tmp
fromsome_table
for rows before_BUILD_TIME
(this can be slow if the ref is a large table) - Wait until after
_BUILD_TIME
- exchange tables
matview
andmatview_tmp
- Insert rows into
matview
frommatview_tmp
as this is data that was sent bymv
whilst we were backfilling intomatview_tmp
There is a MR for that I think: https://github.com/ClickHouse/dbt-clickhouse/pull/123
We've now incorporated a PR which allowed ClickHouse Materialized Views using the TO
format (which is strongly recommended) in the latest 1.6.0 release.
@genzgd Does #207 implement the algorithm mentioned in this issue? After checked the code, the implementation seems quite simple, it doesn't populate/backfill any data.
No, #207 does not populate data, it is indeed quite simple and doesn't support backfill. I should have been more explicit in my comment beofre closing this issue.
Given how ClickHouse materialized views work (and my own practical experience with backfilling them), that would be a fragile and potentially very expensive operation as evidenced by the number of steps above. However, a separate 'backfill' enhancement to the new implementation is something I could see with a "use at your own risk" warning.