arkouda
arkouda copied to clipboard
Add parallel writing when writing pdarrays to Parquet
This PR adds support for a parallelWriteThreshold
flag that
allows a user to determine the size of files of the Parquet files
to be written and then write those files in parallel, appending a
_CORE####
to the end of the file name.
By running the Arkouda server with:
./arkouda_server --ParquetMsg.parallelWriteThreshold=<num>
, a
user is able to control the size of the files that are going to be
written.
This is currently only supported on pdarrays of natively-supported datatypes (meaning not strings or dataframes), but follow work is on the way.