iceberg
iceberg copied to clipboard
Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit
Skips FastAppend manifest cleanup after successful commit if no retries have occurred, as no orphaned manifests could exist if no retries have occurred. This speeds up the happy path of commits by removing 2 unnecessary reads:
- table metadata READ
- manifest list READ
Context from slack thread: https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1718381807647999
We are ingesting streaming data using a java service that does iceberg FastAppend We noticed about ~20% (YMMV) of the fastappend commit time for our usecase is spent on nonrequired cleanup operations, specifically this bit which FastAppend inherits from SnapshotProducer: https://github.com/apache/iceberg/blob/apache-iceberg-1.5.2/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L422-L439
Testing:
- I manually tested this by running
TestFastAppend.testRecoveryWithManifestList
and verifying the cleanup bits are only run when a retry occurs.
Notes:
- we do not skip cleanup operations on commit failures (see:
cleanAll()
) - diff best viewed with
?w=1
: https://github.com/apache/iceberg/pull/10523/files?w=1