dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

1.7.0+ always drop/creates streaming tables, rather than refreshing them if possible

Open mhenniges opened this issue 1 year ago • 2 comments
trafficstars

Describe the bug

As of 1.7, streaming tables are always dropped when model materialized as 'streaming_table' is run, rather than simply refreshed.

Steps To Reproduce

  • create a streaming table model, e.g.
{{ config(
    materialized = 'streaming_table',
)}}

SELECT *, CURRENT_TIMESTAMP() as processed_time
FROM STREAM read_files( '{{ var("my_s3_bucket") }}', format => 'json' )

Run the model multiple times. On every run after the first, dbt will output:

Dropping relation <relation name> because it is of type table and issue a "drop table" statement, followed by a "create streaming table" statement, rather than just a "create or refresh streaming table"

Expected behavior

streaming stable model runs against previously create streaming tables should cause a refresh, not a recreate (except for changes that for a full-refresh, of course)

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

Various - Initially discovered with dbt-databricks and dbt-core 1.7.3,
but verified that it first occurse in dbt-databricks 1.7.0

The operating system you're using:

OSX Ventura 13.3.1

The output of python --version: Python 3.10.11

Additional context

It appears to me the issue is in the _parse_type method introduced in PR 499, the body of which is:

    def _parse_type(self, information: str) -> str:
        type_entry = [
            entry.strip() for entry in information.split("\n") if entry.split(":")[0] == "Type"
        ]
        return type_entry[0] if type_entry else "

The return value from this method is being compared to "STREAMING_TABLE", however as written it returns "Type: STREAMING_TABLE". The return value needs to be split to solve it, though that is not particularly pretty:

        type_entry = [
            entry.split(":")[1].strip() for entry in information.split("\n") if entry.split(":")[0] == "Type"
        ]

_parse_type only seems to be called for this particular check, so there shouldn't be side effects of this change, but I have not tested for that at all.

mhenniges avatar Dec 15 '23 20:12 mhenniges

Thanks for the report and the debugging. Will incorporate the fix into the next release; due to holidays, however, this release will probably not be available until after New Year.

benc-db avatar Dec 18 '23 17:12 benc-db

Sorry this took so long; please give 1.8.0b1 a shot: https://github.com/databricks/dbt-databricks/discussions/595

benc-db avatar Feb 26 '24 23:02 benc-db

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

github-actions[bot] avatar Aug 25 '24 01:08 github-actions[bot]