ray icon indicating copy to clipboard operation
ray copied to clipboard

[Data] Dataset.write_parquet(**arrow_parquet_args) not work

Open bbtfr opened this issue 1 year ago • 0 comments

What happened + What you expected to happen

ray.data.Dataset.write_parquet(**arrow_parquet_args)

arrow_parquet_args does not work anymore, since it has not passed to pq.ParquetWriter here https://github.com/ray-project/ray/blob/d9e795c17a6d4fe61fa57f691c9bcc60dcace72e/python/ray/data/datasource/parquet_datasink.py#L75

And so does arrow_parquet_args_fn

Versions / Dependencies

Ubuntu 22.04 LTS Python 3.10.14 Ray 2.22.0

Reproduction script

ray.data.read_parquet(
    input_path, 
    columns=columns,
).write_parquet(
    output_path, 
    compression="ZSTD", 
    compression_level=100,
)

Issue Severity

High: It blocks me from completing my task.

bbtfr avatar May 22 '24 10:05 bbtfr