arkouda icon indicating copy to clipboard operation
arkouda copied to clipboard

Add parallel writing when writing pdarrays to Parquet

Open bmcdonald3 opened this issue 1 year ago • 0 comments

This PR adds support for a parallelWriteThreshold flag that allows a user to determine the size of files of the Parquet files to be written and then write those files in parallel, appending a _CORE#### to the end of the file name.

By running the Arkouda server with: ./arkouda_server --ParquetMsg.parallelWriteThreshold=<num>, a user is able to control the size of the files that are going to be written.

This is currently only supported on pdarrays of natively-supported datatypes (meaning not strings or dataframes), but follow work is on the way.

bmcdonald3 avatar Dec 13 '23 18:12 bmcdonald3