orc icon indicating copy to clipboard operation
orc copied to clipboard

[WIP] Add support for geometry type

Open ffacs opened this issue 1 year ago • 6 comments

What changes were proposed in this pull request?

POC of geometry type supporting

Why are the changes needed?

Add support for geometry type

How was this patch tested?

UT passed

Was this patch authored or co-authored using generative AI tooling?

NO

ffacs avatar Sep 22 '24 16:09 ffacs

cc @wgtmac @dongjoon-hyun

ffacs avatar Sep 23 '24 13:09 ffacs

Thanks for implementing this! I will hold my review until the spec has been finalized.

wgtmac avatar Sep 24 '24 05:09 wgtmac

Thank you, @ffacs .

To @wgtmac , do you mean the ORC format PR? Or, Iceberg document? Otherwise, could you provide some pointers to track this part, the spec has been finalized?

Thanks for implementing this! I will hold my review until the spec has been finalized.

  • https://github.com/apache/orc-format/pull/18
  • https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.rt0cvesdzsj7

dongjoon-hyun avatar Sep 25 '24 05:09 dongjoon-hyun

https://github.com/apache/parquet-format/pull/240 is still under review and subject to change. I think we can wait until it has been finalized. @dongjoon-hyun

wgtmac avatar Sep 25 '24 06:09 wgtmac

Ya, I want to check ETA for this work. Do you think we can have this for Apache ORC 2.1?

dongjoon-hyun avatar Sep 25 '24 06:09 dongjoon-hyun

If 2.1.0 will be released around January 17, 2025, then I think yes.

wgtmac avatar Sep 25 '24 06:09 wgtmac

Happy New Year, @ffacs and @wgtmac .

We need to release Apache ORC 2.1.0 this month.

Given that the status of Apache Parquet community, it's not finished still, right?

  • https://github.com/apache/parquet-format/pull/240

Let me remove this from ORC 2.1.0 milestone for now, @wgtmac .

dongjoon-hyun avatar Jan 03 '25 16:01 dongjoon-hyun

For the record, we can have this at Apache ORC 3.0.0.

dongjoon-hyun avatar Jan 03 '25 16:01 dongjoon-hyun

Yes, the discussion of Geometry type on the Iceberg and Parquet side took longer time than expected. I don't think it will be closed soon. Moving it to 3.0.0 seems reasonable.

wgtmac avatar Jan 05 '25 13:01 wgtmac

Thank you for the confirmation, @wgtmac .

dongjoon-hyun avatar Jan 05 '25 16:01 dongjoon-hyun

It seems that we are ready to release Apache ORC Format v1.1.0 to support this PR.

  • https://github.com/apache/orc-format/milestone/2?closed=1

dongjoon-hyun avatar Apr 10 '25 00:04 dongjoon-hyun

Can we resume this since the main branch is using ORC Format v1.1, @ffacs and @wgtmac ?

Apache Iceberg v1.9.0 is also released with geo type.

  • https://iceberg.apache.org/releases/#190-release
    • https://github.com/apache/iceberg/pull/10981

dongjoon-hyun avatar Apr 29 '25 15:04 dongjoon-hyun

Can we resume this since the main branch is using ORC Format v1.1, @ffacs and @wgtmac ?

Apache Iceberg v1.9.0 is also released with geo type.

Yes, I'd start work on this these days.

ffacs avatar Apr 30 '25 03:04 ffacs

Thank you so much!

dongjoon-hyun avatar Apr 30 '25 12:04 dongjoon-hyun

Thank you for updating PR, @ffacs .

dongjoon-hyun avatar Jun 02 '25 18:06 dongjoon-hyun

@ffacs . Could you update the PR description and address my comments, please?

dongjoon-hyun avatar Jun 03 '25 15:06 dongjoon-hyun

Thank you for updating. Could you fix the compilation failure, @ffacs ?

Error:  /root/orc/java/core/src/java/org/apache/orc/impl/ColumnStatisticsImpl.java:[1968,9] cannot find symbol
  symbol:   variable Collections
  location: class org.apache.orc.impl.ColumnStatisticsImpl.GeospatialStatisticsImpl

dongjoon-hyun avatar Jun 05 '25 17:06 dongjoon-hyun

Could you run mvn spotless:apply to make Spotless happy?

Error:  Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.44.4:check (analyze-compile) on project orc-core: The following files had format violations:
Error:      src/java/org/apache/orc/impl/ColumnStatisticsImpl.java
Error:          @@ -48,12 +48,12 @@
Error:           import·java.time.chrono.ChronoLocalDate;
Error:           import·java.time.chrono.Chronology;
Error:           import·java.time.chrono.IsoChronology;
Error:          +import·java.util.ArrayList;
Error:          +import·java.util.Collections;
Error:           import·java.util.HashSet;
Error:           import·java.util.List;
Error:          -import·java.util.ArrayList;
Error:           import·java.util.Set;
Error:           import·java.util.TimeZone;
Error:          -import·java.util.Collections;
Error:           
Error:           
Error:           public·class·ColumnStatisticsImpl·implements·ColumnStatistics·{
Error:  Run 'mvn spotless:apply' to fix these violations.

dongjoon-hyun avatar Jun 06 '25 15:06 dongjoon-hyun

Thank you for updating, @ffacs .

dongjoon-hyun avatar Jun 10 '25 15:06 dongjoon-hyun

Thank you, @cxzl25 .

dongjoon-hyun avatar Jun 12 '25 14:06 dongjoon-hyun

Thank you @ffacs, @dongjoon-hyun, @wgtmac, and @cxzl25! Will be merging this now.

williamhyun avatar Jun 12 '25 16:06 williamhyun

Thank you, @williamhyun .

dongjoon-hyun avatar Jun 12 '25 19:06 dongjoon-hyun

Thank you @dongjoon-hyun @williamhyun @cxzl25 .

ffacs avatar Jun 15 '25 10:06 ffacs