iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

API: implement types timestamp_ns and timestamptz_ns

Open jacobmarble opened this issue 1 year ago • 38 comments

Helps #8657

This change adds field ChronoUnit unit to TimestampType, such that TimestampType now represents four specified types:

  • timestamp (existing)
  • timestamptz (existing)
  • timestamp_ns (new #8683)
  • timestamptz_ns (new #8683)

Note that TimestampType.with[out]Zone() are marked as deprecated in this change. In future PRs, I'll remove usage of these static methods.

jacobmarble avatar Nov 08 '23 18:11 jacobmarble

Do these need to be addressed in this PR?

TestSpark3Util > testDescribeSortOrder FAILED
    org.junit.ComparisonFailure: Sort order isn't correct. expected:<[hours(time) DESC NULLS FIRST]> but was:<[]>
        at org.junit.Assert.assertEquals(Assert.java:117)
        at org.apache.iceberg.spark.TestSpark3Util.testDescribeSortOrder(TestSpark3Util.java:84)

TestFilteredScan > [format = parquet, vectorized = false, planningMode = LOCAL] > testHourPartitionedTimestampFilters[format = parquet, vectorized = false, planningMode = LOCAL] FAILED
    java.lang.AssertionError: Primitive value should be equal to expected expected:<8> but was:<5>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:120)
        at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:119)
        at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:68)
        at org.apache.iceberg.spark.source.TestFilteredScan.assertEqualsSafe(TestFilteredScan.java:573)
        at org.apache.iceberg.spark.source.TestFilteredScan.testHourPartitionedTimestampFilters(TestFilteredScan.java:374)

jacobmarble avatar Nov 08 '23 20:11 jacobmarble

Do these need to be addressed in this PR?

TestSpark3Util > testDescribeSortOrder FAILED
    org.junit.ComparisonFailure: Sort order isn't correct. expected:<[hours(time) DESC NULLS FIRST]> but was:<[]>
        at org.junit.Assert.assertEquals(Assert.java:117)
        at org.apache.iceberg.spark.TestSpark3Util.testDescribeSortOrder(TestSpark3Util.java:84)

TestFilteredScan > [format = parquet, vectorized = false, planningMode = LOCAL] > testHourPartitionedTimestampFilters[format = parquet, vectorized = false, planningMode = LOCAL] FAILED
    java.lang.AssertionError: Primitive value should be equal to expected expected:<8> but was:<5>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:120)
        at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:119)
        at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:68)
        at org.apache.iceberg.spark.source.TestFilteredScan.assertEqualsSafe(TestFilteredScan.java:573)
        at org.apache.iceberg.spark.source.TestFilteredScan.testHourPartitionedTimestampFilters(TestFilteredScan.java:374)

@jacobmarble are you sure those failures aren't caused by the changes introduced in this PR?

nastra avatar Nov 14 '23 07:11 nastra

https://github.com/apache/iceberg/actions/runs/7717197684/job/21172136322?pr=9008 looks like a spurious failure? All the rest passed, and even spark-3x-java-17-tests (3.5, 2.13) passes when I run locally:

% JAVA_HOME=/usr/lib64/jvm/java-17-openjdk-17 SPARK_LOCAL_IP=localhost ./gradlew -DsparkVersions=3.5 -DscalaVersion=2.12 -DhiveVersions= -DflinkVersions= :iceberg-spark:iceberg-spark-3.5_2.12:check  -Pquick=true -x javadoc
> Task :iceberg-data:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-spark:iceberg-spark-3.5_2.12:scalastyleMainCheck
Processed 6 file(s)
Found 0 errors
Found 0 warnings
Finished in 802 ms

> Task :iceberg-spark:iceberg-spark-3.5_2.12:compileScala
Unexpected javac output: warning: [options] bootstrap class path not set in conjunction with -source 8
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 warning.

> Task :iceberg-spark:iceberg-spark-3.5_2.12:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-spark:iceberg-spark-3.5_2.12:test
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

See https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 37m 54s
40 actionable tasks: 11 executed, 29 up-to-date

epgif avatar Feb 03 '24 01:02 epgif

I wonder why not using something similar to what we have for decimal with (P,S) for timestamp ? If we want to have "open precision" for timestamp we could imagine to have second/millisecond/microsecond/nanosecond/picosecond/...

For backward compatible, we can keep timestamp/timestamptz and add in Spec V3 'timestampp/timestampptz`.

Just wondering 😄

jbonofre avatar Feb 05 '24 17:02 jbonofre

@rdblue can you take another look at this change?

jacobmarble avatar Feb 29 '24 18:02 jacobmarble

Thanks for the reviews, @rdblue and @Fokko! Please have another look.

epgif avatar Mar 26 '24 18:03 epgif

@rdblue Thanks for the review! Please have another look.

epgif avatar Apr 18 '24 15:04 epgif

@epgif can you please address the test failures?

nastra avatar Apr 25 '24 12:04 nastra

@epgif can you please address the test failures?

Done. Thanks!

epgif avatar Apr 25 '24 17:04 epgif

@epgif can you please address the test failures?

@nastra do you intend to review this pull request further?

jacobmarble avatar May 06 '24 16:05 jacobmarble

@jacobmarble sorry for the delay here. I'm traveling this week and should be able to get to this PR after the Iceberg summit

nastra avatar May 06 '24 23:05 nastra

@nastra is this a good week for you to review?

jacobmarble avatar May 20 '24 19:05 jacobmarble

This should be ready to go. Thanks!

epgif avatar May 21 '24 19:05 epgif

@nastra or @rdblue - friendly reminder to review this

jacobmarble avatar May 28 '24 15:05 jacobmarble

Friendly ping @nastra @rdblue :)

epgif avatar Jun 04 '24 15:06 epgif

Thank you for the quick review @nastra! I believe everything is addressed now. Please have another look when you can.

Thanks!

epgif avatar Jun 06 '24 17:06 epgif

@nastra is there anything else to do on this pull request before merging?

jacobmarble avatar Jun 17 '24 15:06 jacobmarble

@jacobmarble I'll try to do another round this week once merge conflicts have been resolved. Also I think it would be good to get a review from @rdblue on this one

nastra avatar Jun 18 '24 11:06 nastra

@jacobmarble I'll try to do another round this week once merge conflicts have been resolved. Also I think it would be good to get a review from @rdblue on this one

Conflicts resolved. Thanks, @nastra!

epgif avatar Jun 18 '24 15:06 epgif

@nastra I thought the review switched to you because @rdblue was too busy?

This PR, which implements nanoseconds timestamps, is now 7 months old, and the spec change was merged 8 months ago.

@Fokko @rdblue @nastra is there any background concern about this PR that has not been discussed in the PR conversation?

jacobmarble avatar Jul 01 '24 15:07 jacobmarble

Hey @jacobmarble thanks for being patient here. There's no particular concern but I've been busy with other things and it was difficult to find a big block of time to re-review this PR. I'll do a final pass over the PR this week and @amogh-jahagirdar is also planning on taking a look.

nastra avatar Jul 03 '24 15:07 nastra

Thanks @amogh-jahagirdar! I believe I've addressed all the comments.

Please note that @jacobmarble and I are both in the US and will be out of the office Thursday and Friday.

We look forward to your reply next week.

Thanks!

epgif avatar Jul 04 '24 03:07 epgif

Can you please also add the new type(s) to TestSchemaConversions. I also wonder whether we need to test this similarly to TestTimestampsProjection. Given that this type is for v3, wouldn't a v2 writer just blindly write this new type if it's being used anywhere and thus break forward compability?

nastra avatar Jul 08 '24 13:07 nastra

Given that this type is for v3, wouldn't a v2 writer just blindly write this new type if it's being used anywhere and thus break forward compability?

@nastra I don't think I understand your question, would you mind sharing an example?

It seems to me that any writer that doesn't recognize a particular type (in this case timestamp_ns or timestamptz_ns) should fail when it reads the schema in the table metadata. How does the writer "blindly" write to an unimplemented type?

jacobmarble avatar Jul 08 '24 22:07 jacobmarble

Given that this type is for v3, wouldn't a v2 writer just blindly write this new type if it's being used anywhere and thus break forward compability?

@nastra I don't think I understand your question, would you mind sharing an example?

It seems to me that any writer that doesn't recognize a particular type (in this case timestamp_ns or timestamptz_ns) should fail when it reads the schema in the table metadata. How does the writer "blindly" write to an unimplemented type?

@jacobmarble currently nothing is preventing this new type to be written with a V2 writer, meaning that one could use this new type in a schema and write it with a V2 writer. Now an older Iceberg client tries to read this using a V2 reader and will fail.

nastra avatar Jul 09 '24 07:07 nastra

I've marked this PR to go into 1.7.0. @jacobmarble could you please open a follow-up issue to tackle what I mentioned in https://github.com/apache/iceberg/pull/9008#issuecomment-2216824071?

nastra avatar Jul 11 '24 06:07 nastra

overall this LGTM once comments in TestBucketing have been addressed

I've addressed these.

Thanks!

epgif avatar Jul 11 '24 14:07 epgif

I've marked this PR to go into 1.7.0. @jacobmarble could you please open a follow-up issue to tackle what I mentioned in #9008 (comment)?

@nastra I've opened a new issue. I tried to keep it simple; is it too simple? https://github.com/apache/iceberg/issues/10775

jacobmarble avatar Jul 24 '24 17:07 jacobmarble

@nastra what should be our expectation from here? Now that 1.6.0 is released, is this PR merged into main, or does someone else need to review it first?

jacobmarble avatar Jul 24 '24 17:07 jacobmarble

@amogh-jahagirdar would you like to take another look?

nastra avatar Jul 25 '24 15:07 nastra