sedona
sedona copied to clipboard
[SEDONA-94] GeoParquet Reader Writer
Did you read the Contributor Guide?
- Yes, I have read Contributor Rules and Contributor Development Guide
Is this PR related to a JIRA ticket?
- Yes, the URL of the assoicated JIRA ticket is https://issues.apache.org/jira/browse/SEDONA-94. The PR name follows the format
[SEDONA-94] GeoParquet Support For Sedona.
What changes were proposed in this PR?
GeoParquet reader and writer support for Sedona is implemented in this PR.
- This implementation is based on a fork of Spark 3.1 Parquet reader/writer because there was a heavy internal Parquet API change from Spark 3.2 to Spark 3.3.
- This fork is also modified to support Spark 3.0 - 3.3. But the users should expect its non-geospatial behavior identical to Spark 3.0/3.1/3.2.
- VectorizedReader is removed in this reader for compatibility issues since Geometry type is not atomic type anyway.
- We have a neat version that nicely supports Spark 3.3 only. It will be released when Sedona no longer needs to support Spark 3.0 - 3.2.
- Geometry filter on BBox is not implemented. It will be introduced in a follow-up PR.
- Spark 2.4 is not supported. We have no plan to support it since Sedona on Spark 2.4 support will be completely dropped in the next Sedona major release.
Additional notes:
- This PR will be merged after we release Sedona 1.2.1 and drop Spark 2.4 support
- Before merging, we need to remove 'fail-fast: false' from the Scala/Java CI.
How was this patch tested?
Unit tests have been added.
Did this PR include necessary documentation updates?
- Yes, I have updated the documentation update.