Freshness aware table loading in REST catalog
Proposed Change
There are clients of the Iceberg table format (e.g. query engines) that cache table metadata. In order to keep the cache up-to-date they implement different mechanisms like event processing (HMS with Impala) or simply do a full table load on each request. This proposal introduces a way to perform an actual table load only if there is a change on the table metadata since the last request. There is a new Iceberg Catalog level API proposed, and this proposal also describes the implementation details for the REST catalog including the changes required for the REST spec.
Typical use case this would solve:
- Engine receives a query for a particular table
- Engine doesn't have this table in the cache so loads it from an Iceberg Catalog (REST in this proposal)
- Engine gets another request for the same table
- Engine does a freshness aware loading for this table. Full table loading is only performed if the table has changed since. If the table hasn't changed the engine can use the one in it's cache.
Proposal document
https://docs.google.com/document/d/1rnVSP_iv2I47giwfAe-Z3DYhKkKwWCVvCkC9rEvtaLA
Specifications
- [X] Table
- [ ] View
- [X] REST
- [ ] Puffin
- [ ] Encryption
- [X] Other
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
I've been inactive for some weeks, but I'm still planning to work on this in the near future.