dbt-databricks Add support for `INSERT ... BY NAME`

Add support for `INSERT ... BY NAME`

Open flvndh opened this issue 2 years ago • 4 comments

trafficstars

Describe the feature

Starting from Databricks Runtime 13.3 and above, it is possible to use the BY NAME clause instead of an explicit column list in an INSERT statement (documentation).

The feature would add support for this clause as part of the insert_overwrite strategy.

Additional context

This feature would be very handy for ingesting files from cloud storage.

Let's say that we have this model that inserts new partitions every day:

{{ config(materialized="incremental", incremental_strategy="insert_overwrite", partition_by=["creation_date"]) }}

SELECT *
FROM READ_FILES('/Volumes/my_catalog/my_schema/my_volume/{{ var("partition") }}/data.csv', format => 'csv')

The first time (partition=2023/09/06) the ingested data looks like this:

creation_date,a,b
2023-09-06,1,1
2023-09-06,2,2

The table produced has the following schema:

creation_date: date, a: int, b: int

The second time (partition=2023/09/07) the ingested data looks like this:

creation_date,a
2023-09-07,3
2023-09-07,4

The column b is missing. At this point, the current adapter would fail because it would execute this command:

INSERT OVERWRITE TABLE my_catalog.my_schema.my_table
PARTITION (creation_date)
SELECT
    creation_date,
    a,
    b
FROM my_table__dbt_tmp -- A view corresponding the compiled model

but b doesn't exist in the view.

This would not be the case if we instead executed this:

INSERT OVERWRITE TABLE my_catalog.my_schema.my_table
PARTITION (creation_date) BY NAME -- By name this time
SELECT *
FROM my_table__dbt_tmp

The value for b would just be NULL.

Who will this benefit?

People who wants to ingest "bronze" data (with schema evolution in mind) even in the absence of some columns.

Related issues

#334

Sep 06 '23 15:09 flvndh

dbt-databricks dbt-databricks copied to clipboard

Add support for `INSERT ... BY NAME`

Describe the feature

Additional context

Who will this benefit?

Related issues

dbt-databricks
dbt-databricks copied to clipboard