aws-sdk-pandas Support AWS DMS Partitions in Method store_parquet

Support AWS DMS Partitions in Method store_parquet_metadata

Open FleischerT opened this issue 2 years ago • 2 comments

Is your idea related to a problem? Please describe. I am trying to create glue tables for data in S3 written by AWS DMS as partitioned parquet files. The problem is that AWS DMS writes the partitions in the format "2023/05/01/" and not in the Hive standard like "year=2023/month=05/day=01". Now when I try to create the glue tables using the Wrangler method "store_parquet_metadata", the partitions are not recognized because in the internal method "_extract_partitions_metadata_from_paths" is filtered for "=".

Describe the solution you'd like Currently only hive conform partitioning seems to be supported. It would be better if you could pass the partition keys when calling the method.

May 30 '23 09:05 FleischerT

Hi @FleischerT correct, we currently only support Hive-style partitions. We'll discuss with the team and get back to you.

Jun 05 '23 10:06 kukushking

Why not support on S3Settings something like DatePartitionHive?

Dec 20 '23 20:12 webysther

aws-sdk-pandas aws-sdk-pandas copied to clipboard

Support AWS DMS Partitions in Method store_parquet_metadata

aws-sdk-pandas
aws-sdk-pandas copied to clipboard