spark-redshift icon indicating copy to clipboard operation
spark-redshift copied to clipboard

Avro tempformat is slow

Open gorros opened this issue 6 years ago • 3 comments

Hi. I would like to suggest to set default value for tempformat to CSV GZIP. From my experience AVRO is very slow. Here is also another benchmark: https://www.stitchdata.com/blog/redshift-database-benchmarks-copy-performance-of-csv-json-and-avro/

Thanks.

gorros avatar May 31 '18 16:05 gorros

Parquet is fast, which is recently supported by Redshift. https://aws.amazon.com/about-aws/whats-new/2018/06/amazon-redshift-can-now-copy-from-parquet-and-orc-file-formats/

iShiBin avatar Jul 16 '18 16:07 iShiBin

@iShiBin Yes, but this library does not support Parquet as temp format.

gorros avatar Jul 26 '18 10:07 gorros

@gorros Is there a way around this? It's been 6 years since your post and the tempformat option still only supports AVRO, CSV, CSV GZIP! There's got to be a way to save files as parquet.

datasurfergtx avatar Jul 02 '24 16:07 datasurfergtx