iceberg-rust
iceberg-rust copied to clipboard
feat: [1/N] Add write-through cache for manifest list
Which issue does this PR close?
- Closes https://github.com/apache/iceberg-rust/pull/1698
What changes are included in this PR?
Context: I see huge CPU time spent on manifest list loading, especially avro deserialization (see attached PR for details), I want to leverage the object cache to avoid unnecessary IO and deser.
Discussed online with @liurenjie1024 for a bit, see
- https://github.com/apache/iceberg-rust/pull/1698#issuecomment-3332906860
- https://github.com/apache/iceberg-rust/pull/512#discussion_r2365940293
we lean towards the path that:
- Make object cache a read-through and write-through cache for manifest and manifest list
- Later loading attempts from object cache first, could be either a read-through cache, or look-aside for easier implementation
- Make object cache internal and transparent, instead of allow external application to directly access
I plan to structure and split the series of PRs as follows:
- [x] Store manifest list into object cache, if cache enabled
- [ ] Load manifest list with object cache considered, which makes object store a part of file io
- [ ] Replicate the same procedure to manifest files
This PR finishes the first part, which also benefits existing cached scan.
Are these changes tested?
Yes, unit test added.