iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

feat: [1/N] Add write-through cache for manifest list

Open dentiny opened this issue 2 months ago • 0 comments

Which issue does this PR close?

  • Closes https://github.com/apache/iceberg-rust/pull/1698

What changes are included in this PR?

Context: I see huge CPU time spent on manifest list loading, especially avro deserialization (see attached PR for details), I want to leverage the object cache to avoid unnecessary IO and deser.

Discussed online with @liurenjie1024 for a bit, see

  • https://github.com/apache/iceberg-rust/pull/1698#issuecomment-3332906860
  • https://github.com/apache/iceberg-rust/pull/512#discussion_r2365940293

we lean towards the path that:

  • Make object cache a read-through and write-through cache for manifest and manifest list
  • Later loading attempts from object cache first, could be either a read-through cache, or look-aside for easier implementation
  • Make object cache internal and transparent, instead of allow external application to directly access

I plan to structure and split the series of PRs as follows:

  • [x] Store manifest list into object cache, if cache enabled
  • [ ] Load manifest list with object cache considered, which makes object store a part of file io
  • [ ] Replicate the same procedure to manifest files

This PR finishes the first part, which also benefits existing cached scan.

Are these changes tested?

Yes, unit test added.

dentiny avatar Oct 01 '25 00:10 dentiny