paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature] [core] Iceberg: createMetadataWithoutBase should include full history of Paimon snapshots

Open nickdelnano opened this issue 4 months ago • 3 comments

Search before asking

  • [x] I searched in the issues and found nothing similar.

Motivation

Code path: https://github.com/apache/paimon/blob/ff02f6bf3ceccf8dcac38bc58cf6db390509bd46/paimon-core/src/main/java/org/apache/paimon/iceberg/IcebergCommitCallback.java#L305C49-L305C59

createMetadataWithoutBase is called under certain conditions like

  • enabling Iceberg compatibility for the first time
  • committing Iceberg metadata after a previous commit failed in the Iceberg layer (for many possible reasons)

In some cases the Paimon table will have many snapshots already (e.g. snapshots 1 to 1000) and after calling createMetadataWithoutBase, only the latest Paimon commit will be synced to Iceberg.

By syncing the whole Paimon history to Iceberg, the Iceberg compatibility feature becomes suitable for production use cases that require Iceberg time travel.

My use case is this:

  • Sync MySQL tables to Paimon using Flink CDC
  • Tag daily Paimon snapshots automatically
  • Iceberg readers read daily snapshots

When a failure happens in the Iceberg committer the metadata needs to be recreated on the next commit. Only the latest Paimon snapshot is included. The daily snapshot component of this pipeline has broken and cannot be recovered.

Solution

A simple solution may exist. Assume snapshot range [x, y]:

If creating metadata without base:

  • createMetadataWithoutBase for the earliest snapshot, x
  • (for i=x, i <= y, i++) call createMetadataWithBase(i)

Anything else?

Consider making this feature opt-in with configuration as it may be a costly operation to sync many Paimon snapshots to Iceberg and therefore reach the Flink checkpoint timeout.

Are you willing to submit a PR?

  • [x] I'm willing to submit a PR!

nickdelnano avatar Aug 20 '25 18:08 nickdelnano

@LsomeYeah what do you think about this issue and the suggested solution?

nickdelnano avatar Aug 20 '25 21:08 nickdelnano

@LsomeYeah what do you think about this issue and the suggested solution?

@nickdelnano I think this case is reasonable, and I also prefer adding a new option.

LsomeYeah avatar Aug 21 '25 02:08 LsomeYeah

Investigated this some. Syncing all Paimon snapshots to Iceberg looks possible. I realized I have another requirement in my use case around Paimon tags and Iceberg compatibility.

More details on my use case:

  • Sync MySQL tables to Paimon using Flink CDC
  • Tag daily Paimon snapshots automatically
  • Iceberg readers read daily snapshots

Table configuration snapshot.time-retained: 7d tag.num-retained-max: 100 tag.automatic-creation: process-time tag.creation-period: daily

After snapshot.time-retained snapshots are expired from Paimon but daily tags still exist. I need all tags available in Iceberg. IcebergCommitCallback only considers snapshots in the Paimon table so this does not work. I will check on this next.

nickdelnano avatar Aug 29 '25 04:08 nickdelnano