aws-sdk-pandas
aws-sdk-pandas copied to clipboard
Support AWS DMS Partitions in Method store_parquet_metadata
Is your idea related to a problem? Please describe. I am trying to create glue tables for data in S3 written by AWS DMS as partitioned parquet files. The problem is that AWS DMS writes the partitions in the format "2023/05/01/" and not in the Hive standard like "year=2023/month=05/day=01". Now when I try to create the glue tables using the Wrangler method "store_parquet_metadata", the partitions are not recognized because in the internal method "_extract_partitions_metadata_from_paths" is filtered for "=".
Describe the solution you'd like Currently only hive conform partitioning seems to be supported. It would be better if you could pass the partition keys when calling the method.
Hi @FleischerT correct, we currently only support Hive-style partitions. We'll discuss with the team and get back to you.
Why not support on S3Settings something like DatePartitionHive?