dbt-clickhouse icon indicating copy to clipboard operation
dbt-clickhouse copied to clipboard

Add backfill functionality to Materialized Views

Open saurabhbikram opened this issue 2 years ago • 4 comments

I have been using a materialized view as a materialization privately for a few months and would like to share it with the community. However, it is a little bit funky because of the complication for backfill so I just wanted to get @genzgd's opinion before making a PR to see if it's worthwhile?

Below is the skeleton of what the materialization does

Suppose the dbt model matview.sql is

{{ config(..) }}
select a, sum(b) from {{ ref('some_table') }} where inserted_at > _BUILD_TIME group by a

Then the materialised view materialisation does the following

  1. Create table matview_tmp - destination table that holds historic data.
  2. Create table matview - destination table that holds new incoming data for mv.
  3. set _BUILD_TIME to 3secs in future
  4. create or replace mat view matview_mv that contains the sql trigger which sends data to matview for rows inserted insome_table after _BUILD_TIME
  5. Insert into matview_tmp fromsome_table for rows before _BUILD_TIME (this can be slow if the ref is a large table)
  6. Wait until after _BUILD_TIME
  7. exchange tablesmatview and matview_tmp
  8. Insert rows intomatview from matview_tmp as this is data that was sent by mv whilst we were backfilling into matview_tmp

saurabhbikram avatar Feb 07 '23 14:02 saurabhbikram

There is a MR for that I think: https://github.com/ClickHouse/dbt-clickhouse/pull/123

bx2 avatar May 01 '23 07:05 bx2

We've now incorporated a PR which allowed ClickHouse Materialized Views using the TO format (which is strongly recommended) in the latest 1.6.0 release.

genzgd avatar Nov 30 '23 01:11 genzgd

@genzgd Does #207 implement the algorithm mentioned in this issue? After checked the code, the implementation seems quite simple, it doesn't populate/backfill any data.

oliverzy avatar Nov 30 '23 09:11 oliverzy

No, #207 does not populate data, it is indeed quite simple and doesn't support backfill. I should have been more explicit in my comment beofre closing this issue.

Given how ClickHouse materialized views work (and my own practical experience with backfilling them), that would be a fragile and potentially very expensive operation as evidenced by the number of steps above. However, a separate 'backfill' enhancement to the new implementation is something I could see with a "use at your own risk" warning.

genzgd avatar Nov 30 '23 10:11 genzgd