[WIP] Add support for geometry type
What changes were proposed in this pull request?
POC of geometry type supporting
Why are the changes needed?
Add support for geometry type
How was this patch tested?
UT passed
Was this patch authored or co-authored using generative AI tooling?
NO
cc @wgtmac @dongjoon-hyun
Thanks for implementing this! I will hold my review until the spec has been finalized.
Thank you, @ffacs .
To @wgtmac , do you mean the ORC format PR? Or, Iceberg document? Otherwise, could you provide some pointers to track this part, the spec has been finalized?
Thanks for implementing this! I will hold my review until the spec has been finalized.
- https://github.com/apache/orc-format/pull/18
- https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.rt0cvesdzsj7
https://github.com/apache/parquet-format/pull/240 is still under review and subject to change. I think we can wait until it has been finalized. @dongjoon-hyun
Ya, I want to check ETA for this work. Do you think we can have this for Apache ORC 2.1?
If 2.1.0 will be released around January 17, 2025, then I think yes.
Happy New Year, @ffacs and @wgtmac .
We need to release Apache ORC 2.1.0 this month.
Given that the status of Apache Parquet community, it's not finished still, right?
- https://github.com/apache/parquet-format/pull/240
Let me remove this from ORC 2.1.0 milestone for now, @wgtmac .
For the record, we can have this at Apache ORC 3.0.0.
Yes, the discussion of Geometry type on the Iceberg and Parquet side took longer time than expected. I don't think it will be closed soon. Moving it to 3.0.0 seems reasonable.
Thank you for the confirmation, @wgtmac .
It seems that we are ready to release Apache ORC Format v1.1.0 to support this PR.
- https://github.com/apache/orc-format/milestone/2?closed=1
Can we resume this since the main branch is using ORC Format v1.1, @ffacs and @wgtmac ?
Apache Iceberg v1.9.0 is also released with geo type.
- https://iceberg.apache.org/releases/#190-release
- https://github.com/apache/iceberg/pull/10981
Can we resume this since the
mainbranch is using ORC Format v1.1, @ffacs and @wgtmac ?Apache Iceberg v1.9.0 is also released with geo type.
https://iceberg.apache.org/releases/#190-release
Yes, I'd start work on this these days.
Thank you so much!
Thank you for updating PR, @ffacs .
@ffacs . Could you update the PR description and address my comments, please?
Thank you for updating. Could you fix the compilation failure, @ffacs ?
Error: /root/orc/java/core/src/java/org/apache/orc/impl/ColumnStatisticsImpl.java:[1968,9] cannot find symbol
symbol: variable Collections
location: class org.apache.orc.impl.ColumnStatisticsImpl.GeospatialStatisticsImpl
Could you run mvn spotless:apply to make Spotless happy?
Error: Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.44.4:check (analyze-compile) on project orc-core: The following files had format violations:
Error: src/java/org/apache/orc/impl/ColumnStatisticsImpl.java
Error: @@ -48,12 +48,12 @@
Error: import·java.time.chrono.ChronoLocalDate;
Error: import·java.time.chrono.Chronology;
Error: import·java.time.chrono.IsoChronology;
Error: +import·java.util.ArrayList;
Error: +import·java.util.Collections;
Error: import·java.util.HashSet;
Error: import·java.util.List;
Error: -import·java.util.ArrayList;
Error: import·java.util.Set;
Error: import·java.util.TimeZone;
Error: -import·java.util.Collections;
Error:
Error:
Error: public·class·ColumnStatisticsImpl·implements·ColumnStatistics·{
Error: Run 'mvn spotless:apply' to fix these violations.
Thank you for updating, @ffacs .
Thank you, @cxzl25 .
Thank you @ffacs, @dongjoon-hyun, @wgtmac, and @cxzl25! Will be merging this now.
Thank you, @williamhyun .
Thank you @dongjoon-hyun @williamhyun @cxzl25 .