iceberg Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit

Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit

Open grantatspothero opened this issue 8 months ago • 2 comments

Skips FastAppend manifest cleanup after successful commit if no retries have occurred, as no orphaned manifests could exist if no retries have occurred. This speeds up the happy path of commits by removing 2 unnecessary reads:

table metadata READ
manifest list READ

Context from slack thread: https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1718381807647999

We are ingesting streaming data using a java service that does iceberg FastAppend We noticed about ~20% (YMMV) of the fastappend commit time for our usecase is spent on nonrequired cleanup operations, specifically this bit which FastAppend inherits from SnapshotProducer: https://github.com/apache/iceberg/blob/apache-iceberg-1.5.2/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L422-L439

Testing:

I manually tested this by running TestFastAppend.testRecoveryWithManifestList and verifying the cleanup bits are only run when a retry occurs.

Notes:

we do not skip cleanup operations on commit failures (see: cleanAll())
diff best viewed with ?w=1: https://github.com/apache/iceberg/pull/10523/files?w=1

Jun 17 '24 16:06 grantatspothero

iceberg iceberg copied to clipboard

Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit

iceberg
iceberg copied to clipboard