dbt-presto icon indicating copy to clipboard operation
dbt-presto copied to clipboard

Support for incremental models

Open buremba opened this issue 5 years ago • 6 comments

This is one of the killer DBT features and I think that it should be available in Presto as well. I will try to create a PR for this one but it might take a few weeks.

If there's anyone who is willing to implement this feature in less than 2 weeks, feel free to comment.

buremba avatar May 10 '19 16:05 buremba

Thanks @buremba! How are you thinking about implementing incremental models? We like to use the merge statement on Snowflake/BigQuery, but I don't believe that's possible on Presto. For databases that don't support merge, we implement incremental models with a delete and an insert. You can find some more info on this approach here.

I think the delete + insert approach will work, but it's unfortunately not atomic as I don't believe Presto supports transactions? Do you know how most Presto projects handle incremental updates to a table? My understanding is that this is typically handled with partitions, not insert/delete statements

drewbanin avatar May 10 '19 17:05 drewbanin

@drewbanin The only way to make it work in Presto is delete + insert at the moment but Presto supports transactions to some extent. (https://prestodb.github.io/docs/current/sql/start-transaction.html)

The problem is that Presto is actually just a distributed query executor and under the hood, it has a concept of connectors which might be an RDBMS, S3, Hadoop, Elasticsearch, etc. AFAIK, only a few connectors support transactions so this feature won't be available in most of the connectors. I will look into the connectors and see which ones support transactions and will let you know.

buremba avatar May 10 '19 17:05 buremba

Are there any updates on this?

Cabeda avatar Jan 30 '20 18:01 Cabeda

Can we use Alter Table with a new location to simulate this? I believe the alter table statement is atomic. We will have to merge existing data with the incoming data and save it to a new location first, and use an alter table to point the original table to the new location. It has the drawback of not being space efficient(maybe a vacuum step) and only suited for infrequent batch uses

chickenPopcorn avatar Jul 26 '20 05:07 chickenPopcorn

They are working on merge into for hive: https://github.com/trinodb/trino/issues/7708

mdesmet avatar Aug 10 '21 18:08 mdesmet

@mdesmet I've opened an issue on the dbt-trino adapter and I'll try to have a proof-of-concept

Cabeda avatar Sep 08 '21 19:09 Cabeda