cloudtrail-parquet-glue icon indicating copy to clipboard operation
cloudtrail-parquet-glue copied to clipboard

Fixed partitioning issue during raw to parquet

Open andrew-kline opened this issue 3 years ago • 1 comments

Changed mappings in glue_etl.py to tie the Glue-given "partition_[0-6]" names to awslogs, account, region, etc. This should fix the errors being referenced in Issue 1.

andrew-kline avatar Mar 08 '21 22:03 andrew-kline

As a minimum, what we need is a solution that understands all the variables that CloudTrail can introduce. it would be even better if there were the option to map the structure ourselves too as there are some different use cases, especially in enterprise level accounts where the data is not written directly by CloudTrail due Organizations not allowing child accounts to access the original trail data and do something like a "write-back" mechanism to a different S3 bucket.

BigDataDaddy avatar Aug 08 '21 17:08 BigDataDaddy