dbt-athena
dbt-athena copied to clipboard
Support specifying compression type for Parquet and ORC models
It would be helpful if dbt-athena
supported specifying the parquet_compression
and orc_compression
properties for models.
By default, Athena will use GZIP compression for Parquet and ORC tables, but supports several other compression formats (docs). Generally speaking, SNAPPY
is faster to read/write, but GZIP
yields better compression ratios.
It might also be worth exploring using SNAPPY
as the default compression format for Parquet in dbt-athena
.
Or, better yet, simply support the new-as-of-yesterday write_compression
parameter that works for all output types. Release note here: https://docs.aws.amazon.com/athena/latest/ug/release-note-2021-09-16.html
There also exists documentation for which formats support which compression types, which might complicate the implementation here.
@Tomme can you mark this issue as closed by #53?