dbt-external-tables
dbt-external-tables copied to clipboard
Spectrum `STORED AS PARQUET` does not output expected DDL
Describe the bug
When defining external tables in Redshift Spectrum stored as parquet, the expected DDL is not returned by dbt-external-tables
, rendering the external table unreadable.
Steps to reproduce
Config:
version: 2
sources:
- name: spectrum
schema: spectrum
loader: S3
loaded_at_field: loaded_at
tables:
- name: abc
external:
location: ...
stored_as: PARQUET
Expected results
SHOW EXTERNAL TABLE spectrum.abc
Should yield
CREATE EXTERNAL TABLE spectrum.abc (
...
)
PARTITIONED BY ( .. )
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'xyz';
Since this is what is output when I run:
CREATE EXTERNAL TABLE spectrum.abc (
...
)
PARTITIONED BY ( .. )
STORED AS PARQUET
LOCATION 'xyz';
Actual results
The above command returns:
CREATE EXTERNAL TABLE spectrum.abc (
...
)
PARTITIONED BY ( .. )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'xzy';
System information
packages:
- package: dbt-labs/codegen
version: 0.9.0
- package: dbt-labs/redshift
version: 0.8.0
- package: dbt-labs/dbt_utils
version: 1.0.0
- package: dbt-labs/metrics
version: 1.4.1
- package: dbt-labs/dbt_external_tables
version: 0.8.3
Which database are you using dbt with?
- [x] redshift
- [ ] snowflake
- [ ] other (specify: ____________)
The output of dbt --version
:
Core:
- installed: 1.4.5
- latest: 1.4.5 - Up to date!
Plugins:
- redshift: 1.4.0 - Up to date!
- postgres: 1.4.5 - Up to date!
The operating system you're using:
Python 3.9.0
Additional context
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Bump
row_format: serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
file_format: parquet
The above works for me. No need for stored_as
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.