Initial version of Caching Rewrite RFC
✅ Docs preview has no changes
The preview was not built because there were no changes.
Build ID: bde5b2b832cfe04da1c9d551
Deploy Preview for apollo-ios-docc canceled.
| Name | Link |
|---|---|
| Latest commit | 833b70d3ff78ecf4667d080e198f3be99041363f |
| Latest deploy log | https://app.netlify.com/sites/apollo-ios-docc/deploys/68066610da6306000895082e |
One thing that I think is missing from this is a discussion of changes to the
NormalizedCacheprotocol's API itself. I know that we have gotten requests from users to give the cache protocol functions access to more context of the operation that is being run during a cache read (looking at you @jimisaacs 😉).
Waiting to add anything about the NormalizedCache protocol until your work for Swift Concurrency is further along since there will be changes to the protocol to go along with that work.
Thanks for the work and proposal on this, it's nice to see progress being made.
It would be helpful for me to understand how this would work in practice. For example, what happens if you have two queries, each including the same type and/or field, but with a different maxAge value? How would you know what TTL to use for the eviction process?
Separately, there's no mention in the RFC of types or fields you may not want to cache at all. Is that implied by using maxAge: 0? Or would it make sense to have a @cacheControl(noCache: true)?
Thanks for the work and proposal on this, it's nice to see progress being made.
It would be helpful for me to understand how this would work in practice. For example, what happens if you have two queries, each including the same type and/or field, but with a different
maxAgevalue? How would you know what TTL to use for the eviction process?Separately, there's no mention in the RFC of types or fields you may not want to cache at all. Is that implied by using
maxAge: 0? Or would it make sense to have a@cacheControl(noCache: true)?
@PatrickDanino So the cached data will have a createdTime and lastUpdated timestamp, but the cache data itself doesnt know if it is stale or not, that is up to the operation/query that is being run to decide.
So in your example of having 2 different queries which have overlapping data, those 2 queries may be used in separate parts of an application that have different requirements for how "fresh" the data needs to be. So query A may say the TTL is 5 min while query B says the TTL is 15 min. In that scenario, assuming the cached data is currently 10 min old, if you run query B it will pull the data from the cache, if you run query A it will treat the data as stale and the data would then be fetched from the server and updated into the cache. So if query B was run again, it would still pull from the cache since the data is now brand new, and so would query A.
As for eviction, by default it will be LRU (least recently used) based while having some configuration available, but since the cached data doesn't inherently know if it is stale based that isnt factored in.
For ignoring objects from the cache yea that currently isn't called out, I will be updating the RFC soon with more details, specifically around the @cacheControl directive. But to answer your question yes currently the idea would be that having a maxAge of 0 would mean the object is ignored from the cache.
I wanted to share some feedback about how the current (v1.x) cache structure is really failing us right now in hopes that it can lead to improvements in 2.0.
We're using the SQLiteNormalizedCache, and have noticed that after a few months the app experience dramatically degrades. It's somewhat intuitive that more rows in the cache would lead to some slowness, but this felt different. After looking into it we discovered that users end up with a massive QUERY_ROOT entry. It can literally be several MB.
This single row gets read in as a string, parsed into a dictionary, and then inspected before proceeding to run some more queries to fulfill a cache read. When any query data is written to the cache, the same process happens to merge the new data in. The column is read, deserialized to a dictionary, updated in memory, re-serialized, and written back to SQLite.
This causes slowness, but also memory issues (one user hit 3GB memory usage due to many concurrent reads), and excessive disk writes (ultimately the SQL must be written to disk).
It feels especially wasteful because many of the entries are redundant. In the below example, it feels like the cache should just look for that a row with this key instead of using this root object.
"favorites(first:200)": { "$reference": "QUERY_ROOT.favorites(first:200)" },
So why is it getting so large? Many of our APIs take dynamic inputs like timestamps (refreshing) or cursor identifiers (pagination). We want these queries to be cached for a short period of time (like within the same session) and most importantly want the underlying entities to get updated, so telling the system to skip the cache entirely doesn't fit our needs.
In the short term, we're going to just clear the cache after it grows too large, and exclude some queries, but it feels like a better design is possible here. Hoping 2.0 can offer some relief in this area and avoid the issue we're facing.
@pixelmatrix It sounds like what you are running into should be resolved with this new caching work. The changes to the SQLite structure to store 1 field per row to facilitate some more advanced functionality, vs storing the data based on the query that was run as a blob. Also the Time-to-live (max-age) support will let you control how long that data is used from the cache as well.
Glad to hear that! Is this something I can try out in the current beta, or will it land in a future 2.0 release? I've been trying to get it running, but hit some bugs.