aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

Support AWS DMS Partitions in Method store_parquet_metadata

Open FleischerT opened this issue 2 years ago • 2 comments

Is your idea related to a problem? Please describe. I am trying to create glue tables for data in S3 written by AWS DMS as partitioned parquet files. The problem is that AWS DMS writes the partitions in the format "2023/05/01/" and not in the Hive standard like "year=2023/month=05/day=01". Now when I try to create the glue tables using the Wrangler method "store_parquet_metadata", the partitions are not recognized because in the internal method "_extract_partitions_metadata_from_paths" is filtered for "=".

Describe the solution you'd like Currently only hive conform partitioning seems to be supported. It would be better if you could pass the partition keys when calling the method.

FleischerT avatar May 30 '23 09:05 FleischerT

Hi @FleischerT correct, we currently only support Hive-style partitions. We'll discuss with the team and get back to you.

kukushking avatar Jun 05 '23 10:06 kukushking

Why not support on S3Settings something like DatePartitionHive?

webysther avatar Dec 20 '23 20:12 webysther