iceberg
iceberg copied to clipboard
Flink: Move ParquetReader to LogicalTypeAnnotationVisitor
This will enable nanosecond timestamp support, since the OriginalType does not represent the nanosecond timestamp.
Could we add a test too? Minimally for the new timestamp?
@pvary There are already quite a few tests: https://github.com/apache/iceberg/blob/main/flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java
This writes and then reads a Parquet files using the visitor. I've added another test, but it is very hard to assert the readers since they are not public :(
@pvary There are already quite a few tests: https://github.com/apache/iceberg/blob/main/flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java
This writes and then reads a Parquet files using the visitor. I've added another test, but it is very hard to assert the readers since they are not public :(
Does this test both the ns, and the ms timestamps?
Does this test both the ns, and the ms timestamps?
No, not yet. I want to do that in a separate PR, but this change needs to be done before that. In Parquet, there is no OriginalType
(Deprecated) for the Nanosecond timestamp, therefore we have to use the LogicalTypeAnnotation
instead. Once https://github.com/apache/iceberg/pull/9008 has been merged, we can add support for Flink as well 👍
https://github.com/apache/iceberg/pull/9719/files#diff-8d8e70413502d7efe586124264ffb87f49070dee35ac835a8ecdaac03df70196R273-R287
Thanks @Fokko for the PR! Please do not forget the PR for the other Flink versions as well.
Thanks @pvary for the review 👍