aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

Possible to write spark dataframes to glue tables in similar fashion as awswrangler.s3.to_parquet

Open aabid0193 opened this issue 3 years ago • 4 comments

If it isn't possible already, it would be nice i we can use spark dataframes to write to glue tables using something similar to wranglers to_parquet method. It works great for pandas and has the ability to set the mode to overwrite partitions and was wondering if we can do this with spark dataframes.

aabid0193 avatar Nov 03 '22 22:11 aabid0193

wranglers to_parquet method. It works great for pandas and has the ability to set the mode to overwrite partitions and was wondering if we can do this with spark dataframes.

If you are using spark, i would image that simply converting your spark dataframe to a pandas one would get you there if you want to use the wrangler.

sparkDF.toPandas()

emerson131 avatar Nov 04 '22 00:11 emerson131

yeah that is a possibility that you can do right now, however, for large datasets that required the use of spark this wouldn't be ideal

aabid0193 avatar Nov 04 '22 00:11 aabid0193

Essentially what i'm wishing for is the ability to register Athena tables based on the Pyspark dataframe metadata. I see that this was implemented here: https://github.com/aws/aws-sdk-pandas/issues/29. However, it seems to me that this method is no longer supported in the newer versions of wrangler. Additionally would like to overwrite partitions

aabid0193 avatar Nov 04 '22 00:11 aabid0193

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.

github-actions[bot] avatar Jan 03 '23 03:01 github-actions[bot]