geowave icon indicating copy to clipboard operation
geowave copied to clipboard

[Enhancement] Parquet Support

Open hokieg3n1us opened this issue 4 years ago • 1 comments

Expand data types supported for ingest into GeoWave to include Apache Parquet.

Create SparkParquetIngestDriver to ingest Parquet from S3 bucket or HDFS directory. Should be configurable to support creation of geometry from a singular column contained WKT or WKB, or multiple columns contained longitude and latitude for point data.

Create SparkParquetExportDriver to export data from GeoWave to S3 bucket or HDFS directory in Parquet format. Should be configurable to support geometry being a singular column containing WKT or WKB, or multiple columns contained longitude and latitude for point data.

hokieg3n1us avatar Apr 08 '20 19:04 hokieg3n1us

I've written a plugin for AWS Glue MetaStore which allows you to ingest data described in a Glue metastore:

e.g.

geowave ingest localToGw -f glue --glue.database geospatial --glue.table gdelt s3://location gdelt index1,index2

I only support ingesting parquet at the moment. It does work. But needs some cleanup. I'd be willing to contribute the source to the project if I can get sign-off from my employer.

michaeljfazio avatar May 19 '20 05:05 michaeljfazio