apollo-ios-dev icon indicating copy to clipboard operation
apollo-ios-dev copied to clipboard

Initial version of Caching Rewrite RFC

Open BobaFetters opened this issue 8 months ago • 8 comments

Read the formatted RFC here

BobaFetters avatar Mar 14 '25 18:03 BobaFetters

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: bde5b2b832cfe04da1c9d551

svc-apollo-docs avatar Mar 14 '25 18:03 svc-apollo-docs

Deploy Preview for apollo-ios-docc canceled.

Name Link
Latest commit 833b70d3ff78ecf4667d080e198f3be99041363f
Latest deploy log https://app.netlify.com/sites/apollo-ios-docc/deploys/68066610da6306000895082e

netlify[bot] avatar Mar 14 '25 18:03 netlify[bot]

One thing that I think is missing from this is a discussion of changes to the NormalizedCache protocol's API itself. I know that we have gotten requests from users to give the cache protocol functions access to more context of the operation that is being run during a cache read (looking at you @jimisaacs 😉).

Waiting to add anything about the NormalizedCache protocol until your work for Swift Concurrency is further along since there will be changes to the protocol to go along with that work.

BobaFetters avatar Mar 17 '25 17:03 BobaFetters

Thanks for the work and proposal on this, it's nice to see progress being made.

It would be helpful for me to understand how this would work in practice. For example, what happens if you have two queries, each including the same type and/or field, but with a different maxAge value? How would you know what TTL to use for the eviction process?

Separately, there's no mention in the RFC of types or fields you may not want to cache at all. Is that implied by using maxAge: 0? Or would it make sense to have a @cacheControl(noCache: true)?

PatrickDanino avatar Apr 16 '25 22:04 PatrickDanino

Thanks for the work and proposal on this, it's nice to see progress being made.

It would be helpful for me to understand how this would work in practice. For example, what happens if you have two queries, each including the same type and/or field, but with a different maxAge value? How would you know what TTL to use for the eviction process?

Separately, there's no mention in the RFC of types or fields you may not want to cache at all. Is that implied by using maxAge: 0? Or would it make sense to have a @cacheControl(noCache: true)?

@PatrickDanino So the cached data will have a createdTime and lastUpdated timestamp, but the cache data itself doesnt know if it is stale or not, that is up to the operation/query that is being run to decide.

So in your example of having 2 different queries which have overlapping data, those 2 queries may be used in separate parts of an application that have different requirements for how "fresh" the data needs to be. So query A may say the TTL is 5 min while query B says the TTL is 15 min. In that scenario, assuming the cached data is currently 10 min old, if you run query B it will pull the data from the cache, if you run query A it will treat the data as stale and the data would then be fetched from the server and updated into the cache. So if query B was run again, it would still pull from the cache since the data is now brand new, and so would query A.

As for eviction, by default it will be LRU (least recently used) based while having some configuration available, but since the cached data doesn't inherently know if it is stale based that isnt factored in.

For ignoring objects from the cache yea that currently isn't called out, I will be updating the RFC soon with more details, specifically around the @cacheControl directive. But to answer your question yes currently the idea would be that having a maxAge of 0 would mean the object is ignored from the cache.

BobaFetters avatar Apr 17 '25 05:04 BobaFetters

I wanted to share some feedback about how the current (v1.x) cache structure is really failing us right now in hopes that it can lead to improvements in 2.0.

We're using the SQLiteNormalizedCache, and have noticed that after a few months the app experience dramatically degrades. It's somewhat intuitive that more rows in the cache would lead to some slowness, but this felt different. After looking into it we discovered that users end up with a massive QUERY_ROOT entry. It can literally be several MB.

This single row gets read in as a string, parsed into a dictionary, and then inspected before proceeding to run some more queries to fulfill a cache read. When any query data is written to the cache, the same process happens to merge the new data in. The column is read, deserialized to a dictionary, updated in memory, re-serialized, and written back to SQLite.

This causes slowness, but also memory issues (one user hit 3GB memory usage due to many concurrent reads), and excessive disk writes (ultimately the SQL must be written to disk).

It feels especially wasteful because many of the entries are redundant. In the below example, it feels like the cache should just look for that a row with this key instead of using this root object.

"favorites(first:200)": { "$reference": "QUERY_ROOT.favorites(first:200)" },

So why is it getting so large? Many of our APIs take dynamic inputs like timestamps (refreshing) or cursor identifiers (pagination). We want these queries to be cached for a short period of time (like within the same session) and most importantly want the underlying entities to get updated, so telling the system to skip the cache entirely doesn't fit our needs.

In the short term, we're going to just clear the cache after it grows too large, and exclude some queries, but it feels like a better design is possible here. Hoping 2.0 can offer some relief in this area and avoid the issue we're facing.

pixelmatrix avatar Sep 03 '25 04:09 pixelmatrix

@pixelmatrix It sounds like what you are running into should be resolved with this new caching work. The changes to the SQLite structure to store 1 field per row to facilitate some more advanced functionality, vs storing the data based on the query that was run as a blob. Also the Time-to-live (max-age) support will let you control how long that data is used from the cache as well.

BobaFetters avatar Sep 04 '25 15:09 BobaFetters

Glad to hear that! Is this something I can try out in the current beta, or will it land in a future 2.0 release? I've been trying to get it running, but hit some bugs.

pixelmatrix avatar Sep 04 '25 16:09 pixelmatrix