Add backfill functionality to Materialized Views
I have been using a materialized view as a materialization privately for a few months and would like to share it with the community. However, it is a little bit funky because of the complication for backfill so I just wanted to get @genzgd's opinion before making a PR to see if it's worthwhile?
Below is the skeleton of what the materialization does
Suppose the dbt model matview.sql is
{{ config(..) }}
select a, sum(b) from {{ ref('some_table') }} where inserted_at > _BUILD_TIME group by a
Then the materialised view materialisation does the following
- Create table
matview_tmp- destination table that holds historic data. - Create table
matview- destination table that holds new incoming data for mv. - set
_BUILD_TIMEto 3secs in future - create or replace mat view
matview_mvthat contains the sql trigger which sends data tomatviewfor rows inserted insome_tableafter_BUILD_TIME - Insert into
matview_tmpfromsome_tablefor rows before_BUILD_TIME(this can be slow if the ref is a large table) - Wait until after
_BUILD_TIME - exchange tables
matviewandmatview_tmp - Insert rows into
matviewfrommatview_tmpas this is data that was sent bymvwhilst we were backfilling intomatview_tmp
There is a MR for that I think: https://github.com/ClickHouse/dbt-clickhouse/pull/123
We've now incorporated a PR which allowed ClickHouse Materialized Views using the TO format (which is strongly recommended) in the latest 1.6.0 release.
@genzgd Does #207 implement the algorithm mentioned in this issue? After checked the code, the implementation seems quite simple, it doesn't populate/backfill any data.
No, #207 does not populate data, it is indeed quite simple and doesn't support backfill. I should have been more explicit in my comment beofre closing this issue.
Given how ClickHouse materialized views work (and my own practical experience with backfilling them), that would be a fragile and potentially very expensive operation as evidenced by the number of steps above. However, a separate 'backfill' enhancement to the new implementation is something I could see with a "use at your own risk" warning.