workbox icon indicating copy to clipboard operation
workbox copied to clipboard

Consider adding `maxUnusedSeconds` and `updateStale` to `ExpirationPlugin` config

Open atjn opened this issue 4 years ago • 11 comments

Library Affected: workbox-expiration

Issue or Feature Request Description: Hi there! Thanks for making this incredibly useful tool :)

I have a bunch of use cases where I need to either update a resource periodically or where I need to remove a resource that hasn't been used for a while. The ExpirationPlugin with maxAgeSeconds can sort of solve these use cases, but it isn't elegant and it has many bad edge cases.

I have tried to fix my use cases by making some custom plugins, but it is very complicated and requires a lot of IndexedDB calls. I then realised that 95% of the code I need has already been written for the maxAgeSeconds parameter, so maybe I should just propose some minor updates to workbox instead. And now that I have thought it through, I actually think this could solve a long list of challenges that would otherwise require custom code.

I will start with my proposed additions, then show an example. I am happy to discuss this in more detail, or provide more detailed use cases if you'd like :)

Proposed additions

The ExpirationPlugin gets two new parameters in its config:

maxUnusedSeconds number

This will be almost identical to maxAgeSeconds, but instead of calculating the time since the resource was last fetched, it calculates the time since the resource was last used on the website. In other words, how long ago was a request made for the cached resource. As with maxAgeSeconds, if the resource hasn't been used for that amount of time, it is removed from the cache.

updateStale boolean

When a cached resource is older than maxAgeSeconds, it will normally be removed from the cache, but if updateStale is set to true, the plugin will instead try to refetch and update the cached resource. If the update fails, the old resource is kept. This parameter does not affect removals caused by maxEntries or maxUnusedSeconds.

Example: A better image caching recipe

The default image caching recipe shows a simple "cache this for 30 days, then delete" use case.

Let's consider that plugin setup:

new ExpirationPlugin({
  maxEntries: 60,
  maxAgeSeconds: 30 * 24 * 60 * 60, // 30 Days
}),

..vs a default plugin setup that leverages the proposed options:

new ExpirationPlugin({
  maxEntries: 60,
  maxAgeSeconds: 30 * 24 * 60 * 60, // 30 Days
  updateStale: true,
  maxUnusedSeconds: 30 * 24 * 60 * 60, // 30 Days
})

Imagine that a user has been looking at an image every day for some time, but loses their internet connection 28 days after they saw the image for the first time, and the connection ends up being down for 7 days.

With the current setup, the image disappears 2 days after the internet goes down, and is missing for 5 days until the internet comes back up. There was no reason to delete the image, so this is not ideal. With the proposed setup, there will be a failed attempt at updating the image after 2 days, but since it was very recently seen, it will be kept for 5 days until it can eventually be updated.

Now imagine that the image is completely static and never requires updates, and that the user has this app for 5 years.

With the current setup, the app must delete the image once every 30 days, so it will have to download identical copies of the image 120 times. It isn't possible to fix this, because we want the image to expire if it isn't used. With the proposed setup, this would not be nearly as big a problem, because the image can be ETag validated, meaning it won't be downloaded again if the server copy hasn't changed.

With the proposed setup, it would also be possible to change the setup slightly so that no network activity occurs at all for all 5 years:

new ExpirationPlugin({
  maxEntries: 60,
  maxUnusedSeconds: 30 * 24 * 60 * 60, // 30 Days
})

atjn avatar Jun 11 '21 19:06 atjn

@jeffposnick any chance I could get some feedback on this? I'm pretty sure I can make a PR for it, I've just been waiting to hear if you agree with me before I start working on it.

atjn avatar Jul 02 '21 06:07 atjn

Hello! Sorry for the delay in responding. I've needed to find some time to dedicate to reading through this proposal. Thanks for taking the time to write it up, and provide those motivating examples. I've got some questions before I'd recommend you proceed, though.

For reference purposes, the code that currently handles the bulk of the expiration logic is https://github.com/GoogleChrome/workbox/blob/1ec268901d6d8ac5286ec3b473d5b535e88260a3/packages/workbox-expiration/src/ExpirationPlugin.ts#L145-L181

Focusing on the maxUnusedSeconds proposal, how would the logic change? It sounds to me like the main difference is that the updateTimestamp() call would need to run before cacheExpiration.expireEntries(), and that the _isResponseDateFresh() method would have to take maxUnusedSeconds into account. Am I missing anything more that would need to happen?

What happens if someone sets both maxUnusedSeconds and maxSeconds to different values? Which one would "win," or would that not be allowed?

I think that this feature is useful, but would also incur some extra DX overhead when it came to explaining the difference between maxAgeSeconds and maxUnusedSeconds.

Regarding updateStale, I'm less convinced that a configuration option that added fetch()ing behavior to ExpirationPlugin makes sense. From Workbox's perspective, Strategy classes as the things that interact with the network and keep cache entries up to date. It would be a bit hard to wrap your head around what would happen if you, for instance, used a StaleWhileRevalidate strategy with an new ExpirationPlugin({updateStale: true, ...}) associated with it.

So, I think you could potentially add maxUnusedSeconds, after some more discussion around those open questions.

That being said, once of the things that I'd really like to see more of is developers feeling empowered to write their own custom Workbox plugins and strategies (c.f. https://www.youtube.com/watch?v=jR9-aDWZeSE). You mentioned that needing to deal with IDB directly is a blocker, but what if we added CacheTimestampsModel.ts to the publicly exposed interface in workbox-expiration? You would still have to create your own versions of CacheExpiration.ts and ExpirationPlugin.ts that implemented the specific logic you wanted, but that might actually be cleaner (you could omit maxAgeSeconds and only support maxUnusedSeconds) and you'd be free to implement updateStale if you wanted to. Is that approach something you'd consider?

jeffposnick avatar Jul 09 '21 20:07 jeffposnick

Hello, and thank you for taking the time to look it over :)

It took me a little while to respond to your questions because I think I found a pretty big bug in the existing implementation, and it had me confused me for quite a while:

The documentation for what maxAgeSeconds does, is pretty vague:

The maximum age of an entry before it's treated as stale and removed.

...so I did some manual testing and came to the conclusion that it must be calculating the "age" as the time since the resource was downloaded form the server. And based on that assumption, I created this whole feature request for maxUnusedSeconds which calculates the age as the time since the resource was last used on the page.

But now it turns out, the existing implementation is actually doing a bit of both. The lightweight Date check you're doing in isResponseDateFresh is correctly determining the time since the resource was last downloaded from the server, but for the IndexedDB check in expireEntries to also work correctly, the timestamp must only be updated when a new version of the resource is downloaded from the server, and that is not currently the case, since you call updateTimestamp as part of cachedResponseWillBeUsed: https://github.com/GoogleChrome/workbox/blob/46af63c1780955345c117c63c8c8dd54f3d40220/packages/workbox-expiration/src/ExpirationPlugin.ts#L162-L164

In practice, this means that the behavior of the ExpirationPlugin can drastically change based on subtle differences. As an example, if you save your images with the CacheFirst strategy and set maxAgeSeconds to a month, then as long as all your images are served with a correct Date header, they will all be removed and get a fresh update from the server at least once a month. But if you one day accidentally disable the Date header on your server, then the serviceworker will fall back to IndexedDB timestamps, which are updated every time the images are used, so if a user watches those images at least once a month for several years, then the images will just stay in cache and never be redownloaded from the server for all those years.

So you could say that this is no longer a feature request, but more of a bug fix to make the developer decide which strategy to use, rather than letting it be up to chance.

By the way, this code is pretty complex, so please check the code yourself, and tell me if I'm wrong about all this 😄

atjn avatar Jul 17 '21 01:07 atjn

Focusing on the maxUnusedSeconds proposal, how would the logic change? It sounds to me like the main difference is that the updateTimestamp() call would need to run before cacheExpiration.expireEntries(), and that the _isResponseDateFresh() method would have to take maxUnusedSeconds into account. Am I missing anything more that would need to happen?

What happens if someone sets both maxUnusedSeconds and max(Age)Seconds to different values? Which one would "win," or would that not be allowed?

The two max*Seconds options are in practice two separate functions that can run independently of each other, so none of them would "win" over the other. The original example I wrote, details how you can use them simultaneously.

The new maxUnusedSeconds option would need a separate timestamp in IndexedDB, maybe called lastUsedTimestamp. It would also need its own updateTimestamp function, maybe called updateUsedTimestamp, and it would need to be called on handlerWillRespond (the simplest of several solutions IMO). At the same time, the call to updateTimestamp in cachedResponseWillBeUsed should be removed to fix the bug I described in the previous comment.

The expireEntries call could also be moved to handlerWillRespond to make it more accurate (because it runs more often), but also to make sure it runs after the updateUsedTimestamp function, as you mentioned. It would also need to be slightly expanded to now check if either of the timestamps exceed their appropriate max*Seconds option.

As for _isResponseDateFresh(), I don't see how that would be able to take maxUnusedSconds into account, since that would require IndexedDB calls no matter what, and the entire idea behind this function is that it can run synchronously because it is lightweight and doesn't need IndexedDB calls.

atjn avatar Jul 17 '21 01:07 atjn

Regarding updateStale, I'm less convinced that a configuration option that added fetch()ing behavior to ExpirationPlugin makes sense. From Workbox's perspective, Strategy classes as the things that interact with the network and keep cache entries up to date. It would be a bit hard to wrap your head around what would happen if you, for instance, used a StaleWhileRevalidate strategy with an new ExpirationPlugin({updateStale: true, ...}) associated with it.

I personally don't see the problem. On the contrary, It makes a lot of sense to me because ExpirationPlugin already has all the correct IndexedDB calls, asynchronous cross-resource checks, and configuration options that would be needed to pull this feature off, whereas you would otherwise have to reimplement all of that somewhere else. It is also something you can pair with different strategies, resulting in different setups.

In the case with a StaleWhileRevalidate strategy where a new ExpirationPlugin({updateStale: true, ...}) is associated with it, the plugin just wouldn't do anything for often used resources, but it would begin to update old resources that haven't been accessed in a while, thus making sure they don't become too stale if they were to suddenly be accessed while offline.

If you still don't like it, would you instead be interested in creating a new strategy called something like staleWithRevalidationSchedule? It would essentially be staleWhileRevalidate, but it just wouldn't revalidate every time if the resource wasn't too old according to a maxAgeSeconds parameter.

atjn avatar Jul 17 '21 02:07 atjn

That being said, once of the things that I'd really like to see more of is developers feeling empowered to write their own custom Workbox plugins and strategies (c.f. https://www.youtube.com/watch?v=jR9-aDWZeSE). You mentioned that needing to deal with IDB directly is a blocker, but what if we added CacheTimestampsModel.ts to the publicly exposed interface in workbox-expiration? You would still have to create your own versions of CacheExpiration.ts and ExpirationPlugin.ts that implemented the specific logic you wanted, but that might actually be cleaner (you could omit maxAgeSeconds and only support maxUnusedSeconds) and you'd be free to implement updateStale if you wanted to. Is that approach something you'd consider?

Yeah, for my specific use case, that would probably be all I need to get going, but I am still leaning towards adding these things as native features in WorkBox because I really think they could seriously improve some very common use cases, such as the image loading example I described earlier.

atjn avatar Jul 17 '21 02:07 atjn

Sorry for bombarding you with all this text, but it's just a really complex issue. I am happy to elaborate on any of it, and also wouldn't mind having a call with you, if it could help ease the process :)

atjn avatar Jul 17 '21 02:07 atjn

(Sorry, just checking in to let you know that I haven't forgotten about this!)

jeffposnick avatar Aug 04 '21 21:08 jeffposnick

Thanks for letting me know, looking forward to your reply :)

atjn avatar Aug 05 '21 16:08 atjn

@jeffposnick I noticed you added this to the v7 milestone, which sounds great! I just wanted to remind you that I am happy to submit a PR for these changes, although I can't give any guarantees as to when the PR would be ready :)

atjn avatar Jan 02 '22 00:01 atjn

I would actually take a guess that most people are looking for maxUnusedSeconds (not the current maxAgeSeconds). My use-case is perfectly illustrated by the original issue body. I have an image in my app that represents a profile picture and I only want to remove it from cache if it has not been accessed for the specified time.

A change of profile picture changes the URL, so I want to remove outdated URLs based off of when they were last accessed (because the caches never get out of date and it is not uncommon for someone to rarely update their profile picture). If someone changes profile picture, I want to eventually remove that from cache.

Instead of adding an extra maxUnusedSeconds an updateOnAccess or similar could be added, which means that updateTimestamp() would now be called on cache access as well (assuming usage of CacheFirst of course).

Bluenix2 avatar Sep 08 '22 20:09 Bluenix2

In practice, this means that the behavior of the ExpirationPlugin can drastically change based on subtle differences. As an example, if you save your images with the CacheFirst strategy and set maxAgeSeconds to a month, then as long as all your images are served with a correct Date header, they will all be removed and get a fresh update from the server at least once a month. But if you one day accidentally disable the Date header on your server, then the serviceworker will fall back to IndexedDB timestamps, which are updated every time the images are used, so if a user watches those images at least once a month for several years, then the images will just stay in cache and never be redownloaded from the server for all those years.

Gosh, the bug mentioned here was really surprising behavior and explains why some of our users sometimes find their data isn't updating. Somebody has explicitly raised the issue as https://github.com/GoogleChrome/workbox/issues/3070. It does feel like a bug to me, the word age in maxAgeSeconds shouldn't really mean time since last request, it's much more logical to assume it means the age of the cached resource.

mungojam avatar Oct 24 '22 21:10 mungojam