spark-redshift
spark-redshift copied to clipboard
Avro tempformat is slow
Hi. I would like to suggest to set default value for tempformat
to CSV GZIP
. From my experience AVRO
is very slow.
Here is also another benchmark:
https://www.stitchdata.com/blog/redshift-database-benchmarks-copy-performance-of-csv-json-and-avro/
Thanks.
Parquet is fast, which is recently supported by Redshift. https://aws.amazon.com/about-aws/whats-new/2018/06/amazon-redshift-can-now-copy-from-parquet-and-orc-file-formats/
@iShiBin Yes, but this library does not support Parquet as temp format.
@gorros Is there a way around this? It's been 6 years since your post and the tempformat
option still only supports AVRO, CSV, CSV GZIP
! There's got to be a way to save files as parquet.