dataform icon indicating copy to clipboard operation
dataform copied to clipboard

BigQuery Materialized View will be recreated each time dataform project is run

Open p13rr0m opened this issue 1 year ago • 1 comments

BigQuery Materialized View Issue

We have a very large table in BigQuery and have created a filtered, smaller materialized table for analysts. Each day new data gets added to the large table and subsequently to the small view as well.

We are using the dataform CLI to run the models. However, even though we haven't changed the materialized view, every time we run the dataform project, the materialized view will be recreated and we have to process the whole data of the large table again.

We would expect that the materialized view keeps the previously processed data.

This is how we create the materialized view:

config { 
  type: "view", 
  materialized: true,
  bigquery: {
    additionalOptions: {
        enable_refresh: "false"
    },
    partitionBy: "DATE(ingestion_time)",
    clusterBy: ["column_a"]
  }
}
SELECT
    *
FROM
    ${ref("large_table")}
WHERE
    column_b

Thanks for your help!

p13rr0m avatar Aug 28 '24 11:08 p13rr0m

Agreed, this somewhat defeats the purpose of the materialized view. Would be great to see some smart checking ie:

  • Does the same materialized view / schema / partition / cluster exist? If so do not recreate.

justinaugust avatar Nov 26 '24 23:11 justinaugust