iceberg
iceberg copied to clipboard
Bug: Flink data loss after failed to refresh table
Apache Iceberg version
1.1.0
Query engine
Flink
Please describe the bug 🐞
In SnapshotProducer::commit call, if ops.refresh()
failed, table's metadata won't be refresh to latest although new snapshot has been commited. In this situation, we will lost the data in this commit.
Wouldn't it be better to throw the exception of ops.current()
and make this commit fail?
@openinx PTAL
The situation in which I encountered problems is as follows: 1、in checkpoint 5 stage, iceberg commit a new snapshot with sequence number 5. 2、in checkpoint 6 stage, iceberg commit a new snapshot with sequence number 6, and ops.refresh() failed, but flink task doesn't failover and continue to execute. 3、in checkpoint 7 stage, iceberg still commit a new snapshot with sequence number 6。
@Aireed: Maybe the issue would be better handled, if Flink would run the refresh before commiting a new snapshot. Is this issue still happens with newer Iceberg versions?
hello. I'm leaving a comment because I'm experiencing similar situations. I'm using iceberg 1.4.3 and flink 1.15, and I'm experiencing the same problem quite often.
when ops.refresh() faield, the flink leaves WARNING
logs such as Failed to load committed snapshot, skipping manifest clean-up
, Failed to load committed snapshot: omitting sequence number from notifications
.
This case, metadata.json is generated but it is not linking to hive metastore. so, the metadata is dangling.
This issue happens very frequently for me. (at least once a day)
and I checked iceberg code of version 1.5.0 and develop, but I thought there is no change about this.
@maekchi, @Aireed: Which catalog are you using?
The SnapshotProducer
constructor uses ops.current()
to refresh the base snapshot, like:
https://github.com/apache/iceberg/blob/d6c8358ff26957c9234580addb03a0db1e441c4d/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L111
Which should take care of refreshing the current snapshot when the new SnapshotProducer/snapshot is created