parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Why doesn't Parquet currently support writing multiple row groups simultaneously?

Open muyihao opened this issue 1 year ago • 1 comments

Hi Parquet developers,

I have a question regarding the current implementation of Parquet. As far as I understand, Parquet does not support writing multiple row groups simultaneously. Could you please explain the reasoning behind this design choice?

Additionally, I am considering modifying Parquet to allow for multiple row groups to exist in memory and be flushed sequentially. From a high-level perspective, does this approach seem feasible? Are there any potential pitfalls or challenges I should be aware of?

Thank you for your time and assistance.

Best regards,

muyihao avatar Jun 24 '24 11:06 muyihao

This would complicate the implementation and result in large memory footprint. Does it make sense to use multiple file writers instead?

wgtmac avatar Jun 24 '24 14:06 wgtmac