dbt-athena icon indicating copy to clipboard operation
dbt-athena copied to clipboard

Support specifying compression type for Parquet and ORC models

Open aut0clave opened this issue 3 years ago • 2 comments

It would be helpful if dbt-athena supported specifying the parquet_compression and orc_compression properties for models.

By default, Athena will use GZIP compression for Parquet and ORC tables, but supports several other compression formats (docs). Generally speaking, SNAPPY is faster to read/write, but GZIP yields better compression ratios.

It might also be worth exploring using SNAPPY as the default compression format for Parquet in dbt-athena.

aut0clave avatar Sep 14 '21 12:09 aut0clave

Or, better yet, simply support the new-as-of-yesterday write_compression parameter that works for all output types. Release note here: https://docs.aws.amazon.com/athena/latest/ug/release-note-2021-09-16.html

There also exists documentation for which formats support which compression types, which might complicate the implementation here.

aut0clave avatar Sep 17 '21 19:09 aut0clave

@Tomme can you mark this issue as closed by #53?

owenprough-sift avatar Oct 04 '22 12:10 owenprough-sift