iceberg
iceberg copied to clipboard
API: implement types timestamp_ns and timestamptz_ns
Helps #8657
This change adds field ChronoUnit unit
to TimestampType
, such that TimestampType
now represents four specified types:
-
timestamp
(existing) -
timestamptz
(existing) -
timestamp_ns
(new #8683) -
timestamptz_ns
(new #8683)
Note that TimestampType.with[out]Zone()
are marked as deprecated in this change. In future PRs, I'll remove usage of these static methods.
Do these need to be addressed in this PR?
TestSpark3Util > testDescribeSortOrder FAILED
org.junit.ComparisonFailure: Sort order isn't correct. expected:<[hours(time) DESC NULLS FIRST]> but was:<[]>
at org.junit.Assert.assertEquals(Assert.java:117)
at org.apache.iceberg.spark.TestSpark3Util.testDescribeSortOrder(TestSpark3Util.java:84)
TestFilteredScan > [format = parquet, vectorized = false, planningMode = LOCAL] > testHourPartitionedTimestampFilters[format = parquet, vectorized = false, planningMode = LOCAL] FAILED
java.lang.AssertionError: Primitive value should be equal to expected expected:<8> but was:<5>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:120)
at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:119)
at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:68)
at org.apache.iceberg.spark.source.TestFilteredScan.assertEqualsSafe(TestFilteredScan.java:573)
at org.apache.iceberg.spark.source.TestFilteredScan.testHourPartitionedTimestampFilters(TestFilteredScan.java:374)
Do these need to be addressed in this PR?
TestSpark3Util > testDescribeSortOrder FAILED org.junit.ComparisonFailure: Sort order isn't correct. expected:<[hours(time) DESC NULLS FIRST]> but was:<[]> at org.junit.Assert.assertEquals(Assert.java:117) at org.apache.iceberg.spark.TestSpark3Util.testDescribeSortOrder(TestSpark3Util.java:84) TestFilteredScan > [format = parquet, vectorized = false, planningMode = LOCAL] > testHourPartitionedTimestampFilters[format = parquet, vectorized = false, planningMode = LOCAL] FAILED java.lang.AssertionError: Primitive value should be equal to expected expected:<8> but was:<5> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:120) at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:119) at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:68) at org.apache.iceberg.spark.source.TestFilteredScan.assertEqualsSafe(TestFilteredScan.java:573) at org.apache.iceberg.spark.source.TestFilteredScan.testHourPartitionedTimestampFilters(TestFilteredScan.java:374)
@jacobmarble are you sure those failures aren't caused by the changes introduced in this PR?
https://github.com/apache/iceberg/actions/runs/7717197684/job/21172136322?pr=9008
looks like a spurious failure? All the rest passed, and even
spark-3x-java-17-tests (3.5, 2.13)
passes when I run locally:
% JAVA_HOME=/usr/lib64/jvm/java-17-openjdk-17 SPARK_LOCAL_IP=localhost ./gradlew -DsparkVersions=3.5 -DscalaVersion=2.12 -DhiveVersions= -DflinkVersions= :iceberg-spark:iceberg-spark-3.5_2.12:check -Pquick=true -x javadoc
> Task :iceberg-data:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
> Task :iceberg-spark:iceberg-spark-3.5_2.12:scalastyleMainCheck
Processed 6 file(s)
Found 0 errors
Found 0 warnings
Finished in 802 ms
> Task :iceberg-spark:iceberg-spark-3.5_2.12:compileScala
Unexpected javac output: warning: [options] bootstrap class path not set in conjunction with -source 8
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 warning.
> Task :iceberg-spark:iceberg-spark-3.5_2.12:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
> Task :iceberg-spark:iceberg-spark-3.5_2.12:test
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
See https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings
BUILD SUCCESSFUL in 37m 54s
40 actionable tasks: 11 executed, 29 up-to-date
I wonder why not using something similar to what we have for decimal
with (P,S)
for timestamp
?
If we want to have "open precision" for timestamp
we could imagine to have second/millisecond/microsecond/nanosecond/picosecond/...
For backward compatible, we can keep timestamp/timestamptz
and add in Spec V3 'timestampp/timestampptz`.
Just wondering 😄
@rdblue can you take another look at this change?
Thanks for the reviews, @rdblue and @Fokko! Please have another look.
@rdblue Thanks for the review! Please have another look.
@epgif can you please address the test failures?
@epgif can you please address the test failures?
Done. Thanks!
@epgif can you please address the test failures?
@nastra do you intend to review this pull request further?
@jacobmarble sorry for the delay here. I'm traveling this week and should be able to get to this PR after the Iceberg summit
@nastra is this a good week for you to review?
This should be ready to go. Thanks!
@nastra or @rdblue - friendly reminder to review this
Friendly ping @nastra @rdblue :)
Thank you for the quick review @nastra! I believe everything is addressed now. Please have another look when you can.
Thanks!
@nastra is there anything else to do on this pull request before merging?
@jacobmarble I'll try to do another round this week once merge conflicts have been resolved. Also I think it would be good to get a review from @rdblue on this one
@jacobmarble I'll try to do another round this week once merge conflicts have been resolved. Also I think it would be good to get a review from @rdblue on this one
Conflicts resolved. Thanks, @nastra!
@nastra I thought the review switched to you because @rdblue was too busy?
This PR, which implements nanoseconds timestamps, is now 7 months old, and the spec change was merged 8 months ago.
@Fokko @rdblue @nastra is there any background concern about this PR that has not been discussed in the PR conversation?
Hey @jacobmarble thanks for being patient here. There's no particular concern but I've been busy with other things and it was difficult to find a big block of time to re-review this PR. I'll do a final pass over the PR this week and @amogh-jahagirdar is also planning on taking a look.
Thanks @amogh-jahagirdar! I believe I've addressed all the comments.
Please note that @jacobmarble and I are both in the US and will be out of the office Thursday and Friday.
We look forward to your reply next week.
Thanks!
Can you please also add the new type(s) to TestSchemaConversions
. I also wonder whether we need to test this similarly to TestTimestampsProjection
.
Given that this type is for v3, wouldn't a v2 writer just blindly write this new type if it's being used anywhere and thus break forward compability?
Given that this type is for v3, wouldn't a v2 writer just blindly write this new type if it's being used anywhere and thus break forward compability?
@nastra I don't think I understand your question, would you mind sharing an example?
It seems to me that any writer that doesn't recognize a particular type (in this case timestamp_ns
or timestamptz_ns
) should fail when it reads the schema in the table metadata. How does the writer "blindly" write to an unimplemented type?
Given that this type is for v3, wouldn't a v2 writer just blindly write this new type if it's being used anywhere and thus break forward compability?
@nastra I don't think I understand your question, would you mind sharing an example?
It seems to me that any writer that doesn't recognize a particular type (in this case
timestamp_ns
ortimestamptz_ns
) should fail when it reads the schema in the table metadata. How does the writer "blindly" write to an unimplemented type?
@jacobmarble currently nothing is preventing this new type to be written with a V2 writer, meaning that one could use this new type in a schema and write it with a V2 writer. Now an older Iceberg client tries to read this using a V2 reader and will fail.
I've marked this PR to go into 1.7.0. @jacobmarble could you please open a follow-up issue to tackle what I mentioned in https://github.com/apache/iceberg/pull/9008#issuecomment-2216824071?
overall this LGTM once comments in
TestBucketing
have been addressed
I've addressed these.
Thanks!
I've marked this PR to go into 1.7.0. @jacobmarble could you please open a follow-up issue to tackle what I mentioned in #9008 (comment)?
@nastra I've opened a new issue. I tried to keep it simple; is it too simple? https://github.com/apache/iceberg/issues/10775
@nastra what should be our expectation from here? Now that 1.6.0 is released, is this PR merged into main, or does someone else need to review it first?
@amogh-jahagirdar would you like to take another look?