ray
ray copied to clipboard
[Data] Dataset.write_parquet(**arrow_parquet_args) not work
What happened + What you expected to happen
ray.data.Dataset.write_parquet(**arrow_parquet_args)
arrow_parquet_args does not work anymore, since it has not passed to pq.ParquetWriter here
https://github.com/ray-project/ray/blob/d9e795c17a6d4fe61fa57f691c9bcc60dcace72e/python/ray/data/datasource/parquet_datasink.py#L75
And so does arrow_parquet_args_fn
Versions / Dependencies
Ubuntu 22.04 LTS Python 3.10.14 Ray 2.22.0
Reproduction script
ray.data.read_parquet(
input_path,
columns=columns,
).write_parquet(
output_path,
compression="ZSTD",
compression_level=100,
)
Issue Severity
High: It blocks me from completing my task.