dbt-athena Use unique temporary table name + Check schema change

Use unique temporary table name + Check schema change

Open tuan-seek opened this issue 4 years ago • 4 comments

Changes in this PR:

Add ability to create unique temporary table names. This would enable running multiple dbt run concurrently for incremental model.
Implement fail_fast mode when schema changes

Nov 25 '21 07:11 tuan-seek

I've also opened a different issue https://github.com/Tomme/dbt-athena/issues/62 This seems to do exactly that.

Who can review and merge it please?

Mar 10 '22 22:03 Antauri

We'd require this for a performance boost on our queries. Can it be merged?

Mar 22 '22 14:03 Antauri

I've tested this on my own fork, 12 parallel executions (12 batches in parallel for the same hour, distinct sets of minutes from the hour of data) and I confirm it works. If you're going to run DBT in parallel, on the same model, using different "vars" (like the batch number) then at the initial table creation you'll have 12 CTAS instead of 1 CTAS + 11 ITAS (insert-into-as-select) queries, but that's work-aroundable.

Lovely if we could get this merged in the main trunk. This feature helps the use of parallel queries on Athena and gets us down from 20m/hour to 4m/hour by running distinct sets of batches on the same partition (hourly in our case).

Mar 23 '22 14:03 Antauri

@tuan-seek and @Antauri I'm quite interested about this feature, if you are not aware, the community decided to fork Tomme/dbt-athena and have a more community friendly setup to changes, new fork is here: https://github.com/dbt-athena/dbt-athena, available in pip too.

Said so, could you tell me how in possible in your setup to have tmp tables with the same name?

Nov 29 '22 08:11 nicor88

dbt-athena dbt-athena copied to clipboard

Use unique temporary table name + Check schema change

dbt-athena
dbt-athena copied to clipboard