iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Add Nanosecond Precision Support for Flink-Iceberg Integration

Open talatuyarer opened this issue 2 months ago • 10 comments

This PR fixes the issue where nanosecond precision timestamps were being truncated to millisecond precision when using Apache Flink with Apache Iceberg V3 tables. Now you can actually use those fancy TIMESTAMP(9) and TIMESTAMP_LTZ(9) types without losing precision!

The Problem When inserting data with nanosecond precision using Flink SQL like:

INSERT INTO my_table VALUES (TIMESTAMP '2025-01-15 10:30:45.123456789');

The data would mysteriously lose precision and come back as:

2025-01-15 10:30:45.123456

Core Issues: RowDataWrapper was converting all timestamps to microseconds regardless of precision FlinkParquetWriters was losing nanosecond precision when writing to Parquet files StructRowData and RowDataUtil were truncating precision when reading data back FlinkTypeToType was always mapping TIMESTAMP(9) to microsecond Iceberg types (causing Flink to cast to TIMESTAMP(6))

Data Format Support: ✅ Parquet: Fixed writers and readers to preserve nanosecond precision ✅ Avro: Already working perfectly (verified with tests) ❌ ORC: Still needs fixing (truncates to microseconds)

If you want to test by yourself.

-- Create table with nanosecond precision
CREATE TABLE test_table (
    id BIGINT,
    ts TIMESTAMP(9),
    ts_tz TIMESTAMP_LTZ(9)
) WITH (
    'connector' = 'iceberg',
    'catalog-name' = 'hadoop_catalog',
    'warehouse' = 'gs://my-bucket/warehouse',
    'format-version' = 3
);

-- Insert with nanosecond precision
INSERT INTO test_table VALUES 
(1, TIMESTAMP '2025-01-15 10:30:45.123456789', TIMESTAMP '2025-01-15 10:30:45.123456789');

-- Query and verify precision is preserved
SELECT * FROM test_table;
-- Should show: 2025-01-15 10:30:45.123456789 (not truncated!)

talatuyarer avatar Oct 03 '25 09:10 talatuyarer

@mxm and @pvary Could you review this pr when you have time ?

I assumed there was nanosecond support for Iceberg V3 with Flink. It turned out we dont. I gave that support.

talatuyarer avatar Oct 03 '25 09:10 talatuyarer

@rodmeneses was working on this some time ago. You can get inspiration from the PR he created: https://github.com/apache/iceberg/pull/11348

pvary avatar Oct 03 '25 15:10 pvary

@pvary I updated PR and fix all issues. I also check @rodmeneses 's PR and included watermark extraction changes to my PR. I implemented all possible places for nanosecond support. I believe this time Flink's nanosecond support is complete. It is ready for your review 😄

talatuyarer avatar Oct 06 '25 06:10 talatuyarer

@talatuyarer: Could we please limit the changes to Flink 2.1, so it is easier to review, and easier to update based on the review comments?

Also, could you please review the generated tests, and make sure that the relevant tests are there and we don't have tests which are redundant?

One mora suggestion: DataGenerators is used in tests to automatically add tests for the types. Please add timestamp nano to them.

pvary avatar Oct 06 '25 08:10 pvary

Thank you @pvary for reviewing this. As you suggested I removed all other version except 2.1 and also addressed your comments

  • Throw unsupported for Timestamp Precision if it is bigger than 9
  • Added Timestamp Nano to DataGenerators
  • Improved Test Documentation

Please let me know if you see anything missing.

talatuyarer avatar Oct 07 '25 02:10 talatuyarer

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Nov 17 '25 00:11 github-actions[bot]

What is the status of this one @talatuyarer?

pvary avatar Nov 17 '25 08:11 pvary

I think we need to fix this issue. @talatuyarer Could you please let me know the current progress? If there are no further updates, I'd like to continue working on this PR. Thanks!

Guosmilesmile avatar Nov 29 '25 10:11 Guosmilesmile

@pvary and @Guosmilesmile Next week I will update this PR. I am working on this. Sorrty this took longer than I expect.

talatuyarer avatar Nov 29 '25 21:11 talatuyarer

@mxm, @Guosmilesmile: Could you please review?

pvary avatar Dec 11 '25 17:12 pvary