iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Freshness aware table loading in REST catalog

Open gaborkaszab opened this issue 1 year ago • 2 comments

Proposed Change

There are clients of the Iceberg table format (e.g. query engines) that cache table metadata. In order to keep the cache up-to-date they implement different mechanisms like event processing (HMS with Impala) or simply do a full table load on each request. This proposal introduces a way to perform an actual table load only if there is a change on the table metadata since the last request. There is a new Iceberg Catalog level API proposed, and this proposal also describes the implementation details for the REST catalog including the changes required for the REST spec.

Typical use case this would solve:

  • Engine receives a query for a particular table
  • Engine doesn't have this table in the cache so loads it from an Iceberg Catalog (REST in this proposal)
  • Engine gets another request for the same table
  • Engine does a freshness aware loading for this table. Full table loading is only performed if the table has changed since. If the table hasn't changed the engine can use the one in it's cache.

Proposal document

https://docs.google.com/document/d/1rnVSP_iv2I47giwfAe-Z3DYhKkKwWCVvCkC9rEvtaLA

Specifications

  • [X] Table
  • [ ] View
  • [X] REST
  • [ ] Puffin
  • [ ] Encryption
  • [X] Other

gaborkaszab avatar Dec 12 '24 13:12 gaborkaszab

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Jun 11 '25 00:06 github-actions[bot]

I've been inactive for some weeks, but I'm still planning to work on this in the near future.

gaborkaszab avatar Jun 15 '25 04:06 gaborkaszab