sedona icon indicating copy to clipboard operation
sedona copied to clipboard

[SEDONA-94] GeoParquet Reader Writer

Open ashar236 opened this issue 3 years ago • 0 comments

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

  • Yes, the URL of the assoicated JIRA ticket is https://issues.apache.org/jira/browse/SEDONA-94. The PR name follows the format [SEDONA-94] GeoParquet Support For Sedona .

What changes were proposed in this PR?

GeoParquet reader and writer support for Sedona is implemented in this PR.

  • This implementation is based on a fork of Spark 3.1 Parquet reader/writer because there was a heavy internal Parquet API change from Spark 3.2 to Spark 3.3.
  • This fork is also modified to support Spark 3.0 - 3.3. But the users should expect its non-geospatial behavior identical to Spark 3.0/3.1/3.2.
  • VectorizedReader is removed in this reader for compatibility issues since Geometry type is not atomic type anyway.
  • We have a neat version that nicely supports Spark 3.3 only. It will be released when Sedona no longer needs to support Spark 3.0 - 3.2.
  • Geometry filter on BBox is not implemented. It will be introduced in a follow-up PR.
  • Spark 2.4 is not supported. We have no plan to support it since Sedona on Spark 2.4 support will be completely dropped in the next Sedona major release.

Additional notes:

  1. This PR will be merged after we release Sedona 1.2.1 and drop Spark 2.4 support
  2. Before merging, we need to remove 'fail-fast: false' from the Scala/Java CI.

How was this patch tested?

Unit tests have been added.

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation update.

ashar236 avatar Jul 14 '22 23:07 ashar236