dbt-athena
dbt-athena copied to clipboard
Use unique temporary table name + Check schema change
Changes in this PR:
- Add ability to create unique temporary table names. This would enable running multiple
dbt runconcurrently for incremental model. - Implement
fail_fastmode when schema changes
I've also opened a different issue https://github.com/Tomme/dbt-athena/issues/62 This seems to do exactly that.
Who can review and merge it please?
We'd require this for a performance boost on our queries. Can it be merged?
I've tested this on my own fork, 12 parallel executions (12 batches in parallel for the same hour, distinct sets of minutes from the hour of data) and I confirm it works. If you're going to run DBT in parallel, on the same model, using different "vars" (like the batch number) then at the initial table creation you'll have 12 CTAS instead of 1 CTAS + 11 ITAS (insert-into-as-select) queries, but that's work-aroundable.
Lovely if we could get this merged in the main trunk. This feature helps the use of parallel queries on Athena and gets us down from 20m/hour to 4m/hour by running distinct sets of batches on the same partition (hourly in our case).
@tuan-seek and @Antauri I'm quite interested about this feature, if you are not aware, the community decided to fork Tomme/dbt-athena and have a more community friendly setup to changes, new fork is here: https://github.com/dbt-athena/dbt-athena, available in pip too.
Said so, could you tell me how in possible in your setup to have tmp tables with the same name?