Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Implementing hive-style read

Open MisterKloudy opened this issue 4 months ago • 0 comments

When pyspark saves parquets to a folder on a partition, it creates folders of the partition=some_value. When I use daft to read_parquet the parent folder, I would like to get back the columns of the table which were used as partitions. It would be helpful if we could parse the key=value pairs from the hive partition paths back into columns.

MisterKloudy avatar Sep 27 '24 15:09 MisterKloudy