arkouda icon indicating copy to clipboard operation
arkouda copied to clipboard

Closes #2880: Add parallel writing when writing pdarrays to Parquet

Open bmcdonald3 opened this issue 1 year ago • 1 comments

This PR (closes #2880) adds support for a parallelWriteThreshold flag that allows a user to determine the size of files of the Parquet files to be written and then write those files in parallel, appending a _CORE#### to the end of the file name.

By running the Arkouda server with: ./arkouda_server --ParquetMsg.parallelWriteThreshold=<num>, a user is able to control the size of the files that are going to be written.

This is currently only supported on pdarrays of natively-supported datatypes (meaning not strings or dataframes), but follow work is on the way.

bmcdonald3 avatar Dec 13 '23 18:12 bmcdonald3

Since @bmcdonald3 is out and none of my comments are blocking, I'll go ahead and merge this. Thanks again ben!!!

EDIT: apparently ben is in and he wants to hold off for string support

stress-tess avatar Dec 14 '23 18:12 stress-tess