parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

parquet-cli rewrite option

Open MyDELearnings opened this issue 1 year ago • 1 comments

Describe the usage question you have. Please include as many useful details as possible.

Hi ,

is it possible to read directly from a gcs bucket to prune a column like rewrite -i gs:/sourcebbucket/part-00549.parquet -o gs://targetbucket/newdata/dd --prune-columns col4

i am getting error java.lang.RuntimeException: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "gs"

Component(s)

No response

MyDELearnings avatar Aug 07 '24 15:08 MyDELearnings

I don't think we can directly use parquet-cli to rewrite files from cloud object store. You may either download them to rewrite locally, or use the ParquetWriter API to set the file system configuration programatically.

wgtmac avatar Aug 08 '24 07:08 wgtmac