iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Flink: Move ParquetReader to LogicalTypeAnnotationVisitor

Open Fokko opened this issue 1 year ago • 1 comments

This will enable nanosecond timestamp support, since the OriginalType does not represent the nanosecond timestamp.

Fokko avatar Feb 13 '24 19:02 Fokko

Could we add a test too? Minimally for the new timestamp?

pvary avatar Feb 24 '24 06:02 pvary

@pvary There are already quite a few tests: https://github.com/apache/iceberg/blob/main/flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java

This writes and then reads a Parquet files using the visitor. I've added another test, but it is very hard to assert the readers since they are not public :(

Fokko avatar Apr 22 '24 19:04 Fokko

@pvary There are already quite a few tests: https://github.com/apache/iceberg/blob/main/flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java

This writes and then reads a Parquet files using the visitor. I've added another test, but it is very hard to assert the readers since they are not public :(

Does this test both the ns, and the ms timestamps?

pvary avatar Apr 22 '24 20:04 pvary

Does this test both the ns, and the ms timestamps?

No, not yet. I want to do that in a separate PR, but this change needs to be done before that. In Parquet, there is no OriginalType (Deprecated) for the Nanosecond timestamp, therefore we have to use the LogicalTypeAnnotation instead. Once https://github.com/apache/iceberg/pull/9008 has been merged, we can add support for Flink as well 👍

https://github.com/apache/iceberg/pull/9719/files#diff-8d8e70413502d7efe586124264ffb87f49070dee35ac835a8ecdaac03df70196R273-R287

Fokko avatar Apr 22 '24 20:04 Fokko

Thanks @Fokko for the PR! Please do not forget the PR for the other Flink versions as well.

pvary avatar Apr 23 '24 10:04 pvary

Thanks @pvary for the review 👍

Fokko avatar Apr 26 '24 06:04 Fokko