incubator-xtable icon indicating copy to clipboard operation
incubator-xtable copied to clipboard

Issue with Metadata Reconciliation Between Iceberg and Delta Tables During Snapshot Updates

Open MrDerecho opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Please describe the bug 🐞

Description I am encountering an issue while using xtable to perform updates from Iceberg to Delta tables. Here is the observed behavior:

Snapshot 0: The metadata between the Iceberg and Delta tables reconcile as expected. Snapshot 1: Erroneous metadata is generated that includes "add" and "remove" actions that did not actually occur. This results in a lowered row count in the Delta metadata compared to the source Iceberg table. Snapshot 2: The metadata appears to reconcile again and reflects the updated values accurately. Snapshot 3: The issue is recreated with similar discrepancies in the metadata. Additional Context: This behavior has been observed consistently to occur in 5 instances across a sample of 30 tables. The issue is in the largest of these tables with around 7 million files and 7.3 trillion records. This table object is "append-only", the disappeared or removed files on snapshot 1 are re-added in snapshot 2. The issue seems cyclical, occurring every alternate snapshot. The only error/info found in the logs is: "incremental sync is not safe from instant falling back to snapshot sync" and "truncated the string representation of a plan since it was too large" Steps to Reproduce: Use xtable to perform updates from Iceberg to Delta tables. Observe metadata reconciliation across snapshots. Expected Behavior: The metadata between the Iceberg and Delta tables should reconcile accurately across all snapshots, without erroneous "add" or "remove" actions.

Actual Behavior: Alternate snapshots (e.g., snapshots 1 and 3) generate erroneous metadata with inaccurate "add" and "remove" actions, leading to a mismatch in row counts.

Environment Tool: xtable Source: Apache Iceberg Destination: Delta Lake Additional Notes The issue might be related to how snapshots are processed or metadata is generated.

Are you willing to submit PR?

  • [ ] I am willing to submit a PR!
  • [ ] I am willing to submit a PR but need help getting started!

Code of Conduct

MrDerecho avatar Dec 03 '24 21:12 MrDerecho

Hi @MrDerecho thanks for reporting the issue, is it possible to share the iceberg and delta metadata folders with snapshot 1, 2, 3 ? It will help in reproducing it through a unit test.

vinishjail97 avatar Dec 05 '24 22:12 vinishjail97

Thanks, @MrDerecho. Could you please confirm if the "erroneous" add and remove actions you're observing in the Delta commit log are pairs of add and remove actions for the same data files?

If so, I think we've encountered a few instances of this issue as well. It appears to be related to data file paths. XTable uses data file paths as keys for identifying state changes across commits, and there seems to be a code path where the paths for the same files do not match.

ashvina avatar Mar 11 '25 03:03 ashvina