iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Core: Freshness-aware table loading in REST catalog

Open gaborkaszab opened this issue 2 months ago • 11 comments

This is the client-side improvement for the freshness-aware table loading in REST catalog. The main design is the following: - REST server can send an ETag with the LoadTableResponse - The client can use this ETag to populate the IF_NONE_MATCH header with the next loadTable request - The server can send a 304-NOT_MODIFIED response without a body if the table has not been changed based on the ETag - The client when receives a 304, then returns the latest table object associated with the ETag from cache

gaborkaszab avatar Oct 22 '25 12:10 gaborkaszab

This change has 2 dependencies also included in the chain:

  1. https://github.com/apache/iceberg/pull/14035 for the IRC to return 304
  2. https://github.com/apache/iceberg/pull/14378 For FileIOTracker tests

gaborkaszab avatar Oct 22 '25 12:10 gaborkaszab

Let me give an initial insight into the different decisions I made during the implementation:

1) Client-side table cache

Entire table objects are cached on the client side. The table objects hold user level configs and credentials, so in order to not share these accross users, the client-side table cache is arranged on a per-sessionID basis. For simplicity I chose to use a composite key: (SessionID, TableIdentifier) for the cache.

Later if there is a need, the cache can be enhanced to have 2 separate levels for the sessionID and the TableIdentifier instead of a composite key. With this we can have different cache eviction policies for the sessions (max number if entries, TTL) and for the tables. Also we could separate sessions not to evict tables related to other sessions. For a reference implementation see RESTTableCache here.

2) Table objects returned by loadTable

Currently, each RESTCatalog.loadTable() call returns a different Table object. Changing this and returning 'shared' table objects similarly to CachingCatalog is a breaking change in my opinion. Take this use-case: One query could load the table for query planning, then another table also loads the table and commits some changes. In this case the table being held by the first query is changed in the meantime, and as it's a shared object it could mess up planning. This is why I chose not to return the cached Table objects directly, but to do a 'clone' object instead and return that. As a result, in case the client commits to the table, it won't be reflected in the cache and the next loadTable request to the server will result in a full table load, because the ETag stored on the client side is not the latest one.

3) Table commits

As a direct result of the above point the table commits through RESTTableOperations aren't reflected in the cache. As a future improvement we can enhance this, but I haven't found a very clean solution for this. What I experimented with is to provide a loadTableCallback to the RESTTableOperations that is called after getting the LoadTableResult. Setting up such a callback properly can update the cache in RESTSessionCatalog. For a reference implementation see RESTTableOperations and RESTSessionCatalog.loadTableCallback() here.

4) SnapshotMode = REFS

When lazy snapshot loading is configured, loadTable will populate the cache with a table object that has partially loaded snapshot list. If the client calls snapshots() and loads the rest of the snapshots, this won't be reflected in the cache. As a future improvement the snapshotsSupplier provided to TableMetadata could be clever enough to refresh the cache too. I found thes too complicated to implement for the initial version.

5) Place of the ETag

Initially I've put the ETag into RESTTableOperations, but since the operations doesn't participate in the cache maintenance nor in freshness-aware loading, then I decided not to put ETag there. TableMetadata seemed too general, I wanted to use a REST catalog specific location, hence I've introduced a composite value for the table cache TableWithETag.

6) Cache eviction policies

No weakKeys, weakValues (or similar) could be used with the current design: the returned tables are clones, so weakValues would nearly immediately evict the entry from the cache right after adding it. The maximum number of entries is controlled via a catalog property. With the (SessionID, TableIdentifier) composite key this means that one active user could evict the tables from a less active user. For future improvements see section 1) Another way of evicting from cache is using a TTL that is also configurable by a catalog property. Note, the timer resets after write and not after access. The reason is that when loading the table into the cache will also contain configs and credentials that could expire. Setting a TTL after write properly could evict the table from cache before the expiration of such configs and credentials.

gaborkaszab avatar Oct 22 '25 13:10 gaborkaszab

Last change is a rebase with main

gaborkaszab avatar Nov 13 '25 11:11 gaborkaszab

Hi @danielcweeks @nastra @amogh-jahagirdar , This is the client-side of the freshness-aware loading improvement in REST catalog. Would you mind taking a look?

gaborkaszab avatar Nov 20 '25 13:11 gaborkaszab

Hi @danielcweeks @amogh-jahagirdar @nastra , This one would be the finishing step for the freshness-aware loading functionality in REST catalog. Would you mind taking a look? I described a number of decision points and the reason for my decisions in one of the first comments of this PR.

gaborkaszab avatar Dec 01 '25 09:12 gaborkaszab

LGTM, please rebase. Let's see if @danielcweeks, @nastra or @amogh-jahagirdar has any more comments

pvary avatar Dec 05 '25 09:12 pvary

Thanks for taking a look @pvary ! This apparently collided with the injectable rest ops PR. Resolved the conflicts.

gaborkaszab avatar Dec 05 '25 11:12 gaborkaszab

Conflicts with some of the recent changes around this area. Will rebase once this PR is merged.

gaborkaszab avatar Dec 10 '25 10:12 gaborkaszab

Rebased with main but apparently has another merge conflict. This time with the server-side planning changes

gaborkaszab avatar Dec 11 '25 08:12 gaborkaszab

There is a filing test in TestRESTScanPlanning, but that apparently fails across other PRs too. cc @singhpk234

gaborkaszab avatar Dec 11 '25 16:12 gaborkaszab

Thanks @gaborkaszab i am not sure what is causing the flakyness as my CI green when we merged, i will take a deeper look and put a fix ASAP.

singhpk234 avatar Dec 11 '25 16:12 singhpk234

Update on the latest changes: There have been multiple merged PRs recently that conflicted with this one: injectable RESTTableOperations and server-side planning. In fact because of these features the freshness-aware loading code had to be revised to support interworking with the mentioned functionalities. As a result, the table cache doesn't hold actual table objects, but table suppliers, that can be used to produce table objects in case of a 304. With this, encapsulating TableOps and RESTTable creation into this table supplier we won't break injectable ops, and we can use freshness-aware loading for RESTTables too. Thanks @pvary for the help here!

As a side-effect, since we don't cache table (including ops and IO), the FileIOTracker tests are no longer needed within this PR. Removed them.

cc @flyrain @XJDKC @singhpk234 since this intersects with the PRs you've merged recently.

gaborkaszab avatar Dec 15 '25 20:12 gaborkaszab

Looks nice @gaborkaszab!

Do we have tests for how the injectable table ops change interacts with the cache?

pvary avatar Dec 16 '25 09:12 pvary

Thanks for taking a look, @pvary !

Do we have tests for how the injectable table ops change interacts with the cache? Apparently we don't. Added a new test to cover this.

gaborkaszab avatar Dec 16 '25 12:12 gaborkaszab