dbt-external-tables
dbt-external-tables copied to clipboard
dbt-athena-community support
Description & motivation
resolves #274
PR based on https://github.com/dbt-labs/dbt-external-tables/pull/133
- uses dbt-athena-community > 1.4.1, not compatible yet with earlier versions
- removed JSON format source/tests for Athena, needs to be JSONL to work with Athena
- PR raised for info and to check tests pass
Checklist
- [/] I have verified that these changes work locally
- [/] I have updated the README.md (if applicable)
- [/] I have added an integration test for my fix/feature (if applicable)
Local run for info:
$ ./run_test.sh athena
Setting up virtual environment
Changing working directory: integration_tests
Starting integration tests
15:59:49 Running with dbt=1.4.6
15:59:50 Installing ../
15:59:50 Installed from <local @ ../>
15:59:50 Installing dbt-labs/dbt_utils
15:59:50 Installed from version 0.8.0
15:59:50 Updated version available: 1.1.0
15:59:50
15:59:50 Updates available for packages: ['dbt-labs/dbt_utils']
Update your versions in packages.yml, then run dbt deps
15:59:52 Running with dbt=1.4.6
15:59:52 Found 0 models, 2 tests, 0 snapshots, 0 analyses, 560 macros, 0 operations, 1 seed file, 5 sources, 0 exposures, 0 metrics
15:59:52
15:59:54 Concurrency: 1 threads (target='athena')
15:59:54
15:59:54 1 of 1 START seed file dbt_external_tables_integration_tests_athena.people ..... [RUN]
16:00:04 1 of 1 OK loaded seed file dbt_external_tables_integration_tests_athena.people . [CREATE 200 in 10.79s]
16:00:04
16:00:04 Finished running 1 seed in 0 hours 0 minutes and 12.15 seconds (12.15s).
16:00:04
16:00:04 Completed successfully
16:00:04
16:00:04 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
16:00:07 Running with dbt=1.4.6
16:00:07 No prep necessary, skipping
16:00:09 Running with dbt=1.4.6
16:00:09 Unable to do partial parsing because config vars, config profile, or config target have changed
16:00:11 1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
16:00:12 1 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...
16:00:13 1 of 4 (1) OK -1
16:00:13 1 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...
16:00:15 1 of 4 (2) OK -1
16:00:15 2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
16:00:17 2 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...
16:00:18 2 of 4 (1) OK -1
16:00:18 2 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...
16:00:20 2 of 4 (2) OK -1
16:00:20 2 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
16:00:23 2 of 4 (3) OK -1
16:00:23 3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
16:00:26 3 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...
16:00:27 3 of 4 (1) OK -1
16:00:27 3 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...
16:00:29 3 of 4 (2) OK -1
16:00:29 3 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
16:01:06 3 of 4 (3) OK -1
16:01:06 3 of 4 (4) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
16:01:42 3 of 4 (4) OK -1
16:01:42 3 of 4 (5) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
16:02:16 3 of 4 (5) OK -1
16:02:16 4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
16:02:17 4 of 4 (1) drop table if exists `awsdatacatalog`.`dbt_external_tables_integration_tests_ath...
16:02:19 4 of 4 (1) OK -1
16:02:19 4 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...
16:02:20 4 of 4 (2) OK -1
16:02:20 4 of 4 (3) msck repair table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena...
16:02:25 4 of 4 (3) OK -1
16:02:27 Running with dbt=1.4.6
16:02:27 Unable to do partial parsing because config vars, config profile, or config target have changed
16:02:29 1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
16:02:30 1 of 4 SKIP
16:02:30 2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
16:02:33 2 of 4 (1) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
16:02:36 2 of 4 (1) OK -1
16:02:36 3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
16:02:39 3 of 4 (1) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
16:03:15 3 of 4 (1) OK -1
16:03:15 3 of 4 (2) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
16:03:51 3 of 4 (2) OK -1
16:03:51 3 of 4 (3) alter table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
16:04:25 3 of 4 (3) OK -1
16:04:25 4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
16:04:26 4 of 4 (1) msck repair table `awsdatacatalog`.`dbt_external_tables_integration_tests_athena...
16:04:31 4 of 4 (1) OK -1
16:04:33 Running with dbt=1.4.6
16:04:33 Found 0 models, 2 tests, 0 snapshots, 0 analyses, 560 macros, 0 operations, 1 seed file, 5 sources, 0 exposures, 0 metrics
16:04:33
16:04:34 Concurrency: 1 threads (target='athena')
16:04:34
16:04:34 1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_ [RUN]
16:04:38 1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_ [PASS in 3.84s]
16:04:38 2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_ [RUN]
16:04:41 2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_ [PASS in 2.77s]
16:04:41
16:04:41 Finished running 2 tests in 0 hours 0 minutes and 7.58 seconds (7.58s).
16:04:41
16:04:41 Completed successfully
16:04:41
16:04:41 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
Re: https://github.com/dbt-labs/dbt-external-tables/pull/133#discussion_r811471521 (need for quote_comment:
key to get around invalid comment chars: this fix doesn't seem to work, at least in Athena engine v3. The whole query gets commented out
16:09:31 1 of 4 (2) create external table `awsdatacatalog`.`dbt_external_tables_integration_tests_at...
16:10:52 Encountered an error while running operation: Runtime Error
Runtime Error
[ErrorCategory:USER_ERROR, ErrorCode:DDL_FAILED], Detail:FAILED: ParseException line 1:489 cannot recognize input near '<EOF>' '<EOF>' '<EOF>'
Note - https://github.com/dbt-athena/dbt-athena/pull/161 effectively added a large subset of external tables functionality in dbt-athena itself. Might be worth trying to refactor that and utilise it to cut down on the duplicated logic in here
@brabster what's needed to get this PR approved? I'm happy to contribute.
@aidan-o-boyle-kroo hi there! I've just pulled this, it is still working on dbt-athena-community 1.4.6 and works against latest 1.6.1 too.
$ ATHENA_TEST_DBNAME=AwsDataCatalog AWS_REGION=eu-west-2 ATHENA_TEST_BUCKET=my-redacted_bucket ATHENA_TEST_WORKGROUP=primary ./run_test.sh athena
Setting up virtual environment for dbt-athena
Changing working directory: integration_tests
Starting integration tests
19:25:28 Running with dbt=1.6.3
19:25:29 Installing ../
19:25:29 Installed from <local @ ../>
19:25:29 Installing dbt-labs/dbt_utils
19:25:29 Installed from version 0.8.0
19:25:29 Updated version available: 1.1.1
19:25:29
19:25:29 Updates available for packages: ['dbt-labs/dbt_utils']
Update your versions in packages.yml, then run dbt deps
19:25:32 Running with dbt=1.6.3
19:25:32 Registered adapter: athena=1.6.1
19:25:32 Unable to do partial parsing because config vars, config profile, or config target have changed
19:25:34 Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:34
19:25:37 Concurrency: 1 threads (target='athena')
19:25:37
19:25:37 1 of 1 START seed file dbt_external_tables_integration_tests_athena.people ..... [RUN]
19:25:48 1 of 1 OK loaded seed file dbt_external_tables_integration_tests_athena.people . [CREATE 200 in 11.03s]
19:25:48
19:25:48 Finished running 1 seed in 0 hours 0 minutes and 14.18 seconds (14.18s).
19:25:48
19:25:48 Completed successfully
19:25:48
19:25:48 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
19:25:51 Running with dbt=1.6.3
19:25:51 Registered adapter: athena=1.6.1
19:25:51 Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:51 No prep necessary, skipping
19:25:54 Running with dbt=1.6.3
19:25:54 Registered adapter: athena=1.6.1
19:25:54 Unable to do partial parsing because config vars, config profile, or config target have changed
19:25:57 Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:25:57 1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
19:25:58 1 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...
19:25:59 1 of 4 (1) OK -1
19:25:59 1 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...
19:26:01 1 of 4 (2) OK -1
19:26:01 2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
19:26:02 2 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...
19:26:03 2 of 4 (1) OK -1
19:26:03 2 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...
19:26:05 2 of 4 (2) OK -1
19:26:05 2 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
19:26:08 2 of 4 (3) OK -1
19:26:08 3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
19:26:10 3 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...
19:26:11 3 of 4 (1) OK -1
19:26:11 3 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...
19:26:12 3 of 4 (2) OK -1
19:26:12 3 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
19:26:49 3 of 4 (3) OK -1
19:26:49 3 of 4 (4) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
19:27:25 3 of 4 (4) OK -1
19:27:25 3 of 4 (5) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
19:27:59 3 of 4 (5) OK -1
19:27:59 4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
19:27:59 4 of 4 (1) drop table if exists `AwsDataCatalog`.`dbt_external_tables_integration_tests_ath...
19:28:00 4 of 4 (1) OK -1
19:28:00 4 of 4 (2) create external table `AwsDataCatalog`.`dbt_external_tables_integration_tests_at...
19:28:01 4 of 4 (2) OK -1
19:28:01 4 of 4 (3) msck repair table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena...
19:28:04 4 of 4 (3) OK -1
19:28:07 Running with dbt=1.6.3
19:28:07 Registered adapter: athena=1.6.1
19:28:07 Unable to do partial parsing because config vars, config profile, or config target have changed
19:28:09 Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:28:09 1 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_unpartitioned
19:28:10 1 of 4 SKIP
19:28:10 2 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_partitioned
19:28:12 2 of 4 (1) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
19:28:16 2 of 4 (1) OK -1
19:28:16 3 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned
19:28:17 3 of 4 (1) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
19:28:53 3 of 4 (1) OK -1
19:28:53 3 of 4 (2) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
19:29:29 3 of 4 (2) OK -1
19:29:29 3 of 4 (3) alter table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena`.`peo...
19:30:03 3 of 4 (3) OK -1
19:30:03 4 of 4 START external source dbt_external_tables_integration_tests_athena.people_csv_multipartitioned_hive_compatible
19:30:04 4 of 4 (1) msck repair table `AwsDataCatalog`.`dbt_external_tables_integration_tests_athena...
19:30:06 4 of 4 (1) OK -1
19:30:09 Running with dbt=1.6.3
19:30:09 Registered adapter: athena=1.6.1
19:30:09 Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 619 macros, 0 groups, 0 semantic models
19:30:09
19:30:10 Concurrency: 1 threads (target='athena')
19:30:10
19:30:11 1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_ [RUN]
19:30:14 1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_ [PASS in 3.83s]
19:30:14 2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_ [RUN]
19:30:17 2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_ [PASS in 2.77s]
19:30:17
19:30:17 Finished running 2 tests in 0 hours 0 minutes and 7.89 seconds (7.89s).
19:30:17
19:30:17 Completed successfully
19:30:17
19:30:17 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
I've love to get it merged, will remove draft label. Main concerns would be:
- stability but it hasn't broken yet and probably better to have support than no support
- lack of automated testing (Circle CI break is due to unset vars pointing to AWS), would need an AWS account configuring up and paying for, unsure what the policy is on that
I am depending on my fork for multiple projects now - you can kick tyres and check it's working for you that way I guess.
could we mock athena ? https://github.com/getmoto/moto/blob/master/IMPLEMENTATION_COVERAGE.md#athena
We could - I'm not sure how effective a test that would be, and I'm not sure what the maintainers need in order to merge the PR. @jeremyyeo can you advise on what we'd need to do to get this PR merged in? :bowing_man:
This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.
I'd also like to contribute whatever it takes to get this merged. This would be really helpful for our team.
@dataders who should we add as reviewer to merge this one? 🙏🏻 Quite some folks from the community mentioned dbt-external-tables in few occasions.
I've just set it up again with latest dbt-athena-community against my personal AWS account. All appears to still be working fine, integration tests run and pass. I've added an example of minimal IAM permissions and defaulted a config value to assist with any future test automation setup. Also checked that the implementation does its own drop-if logic and so doesn't appear to inherit any inappropriate housekeeping behaviour from the adapter.
(venv) @brabster ➜ /workspaces/dbt-external-tables/integration_tests (dbt-athena-community-support) $ dbt test --target athena
16:59:28 Running with dbt=1.7.13
16:59:29 Registered adapter: athena=1.7.2
16:59:29 Found 1 seed, 2 tests, 5 sources, 0 exposures, 0 metrics, 683 macros, 0 groups, 0 semantic models
16:59:29
16:59:30 Concurrency: 1 threads (target='athena')
16:59:30
16:59:30 1 of 2 START test dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_ [RUN]
16:59:32 1 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_partitioned_id__first_name__last_name__email__ref_people_ [PASS in 2.59s]
16:59:32 2 of 2 START test dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_ [RUN]
16:59:35 2 of 2 PASS dbt_utils_source_equality_athena_external_people_csv_unpartitioned_id__first_name__last_name__email__ref_people_ [PASS in 2.46s]
16:59:35
16:59:35 Finished running 2 tests in 0 hours 0 minutes and 5.65 seconds (5.65s).
16:59:35
16:59:35 Completed successfully
16:59:35
16:59:35 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2