iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Bug: Flink data loss after failed to refresh table

Open Aireed opened this issue 1 year ago • 4 comments

Apache Iceberg version

1.1.0

Query engine

Flink

Please describe the bug 🐞

In SnapshotProducer::commit call, if ops.refresh() failed, table's metadata won't be refresh to latest although new snapshot has been commited. In this situation, we will lost the data in this commit.

image

Wouldn't it be better to throw the exception of ops.current() and make this commit fail?

@openinx PTAL

The situation in which I encountered problems is as follows: 1、in checkpoint 5 stage, iceberg commit a new snapshot with sequence number 5. 2、in checkpoint 6 stage, iceberg commit a new snapshot with sequence number 6, and ops.refresh() failed, but flink task doesn't failover and continue to execute. 3、in checkpoint 7 stage, iceberg still commit a new snapshot with sequence number 6

Aireed avatar Feb 19 '24 08:02 Aireed

@Aireed: Maybe the issue would be better handled, if Flink would run the refresh before commiting a new snapshot. Is this issue still happens with newer Iceberg versions?

pvary avatar Feb 24 '24 06:02 pvary

hello. I'm leaving a comment because I'm experiencing similar situations. I'm using iceberg 1.4.3 and flink 1.15, and I'm experiencing the same problem quite often.

when ops.refresh() faield, the flink leaves WARNING logs such as Failed to load committed snapshot, skipping manifest clean-up, Failed to load committed snapshot: omitting sequence number from notifications. This case, metadata.json is generated but it is not linking to hive metastore. so, the metadata is dangling. This issue happens very frequently for me. (at least once a day)

and I checked iceberg code of version 1.5.0 and develop, but I thought there is no change about this.

maekchi avatar Mar 22 '24 15:03 maekchi

@maekchi, @Aireed: Which catalog are you using?

The SnapshotProducer constructor uses ops.current() to refresh the base snapshot, like: https://github.com/apache/iceberg/blob/d6c8358ff26957c9234580addb03a0db1e441c4d/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L111

Which should take care of refreshing the current snapshot when the new SnapshotProducer/snapshot is created

pvary avatar Mar 26 '24 16:03 pvary