workbox icon indicating copy to clipboard operation
workbox copied to clipboard

Proposal: workbox-clear-storage

Open jeffposnick opened this issue 3 years ago • 3 comments

Motivation

Any well-planned service worker production rollout should include steps for rolling back service worker functionality and clearing out cached data, to deal with a vulnerability or unintended behavior that can't be dealt with via a simple remediation. This "kill switch" behavior is best planned ahead of time, in a well-thought-out fashion, instead of attempting to implement something in an ad hoc fashion while in the middle of a production issue.

There are some existing recipes for implementing "kill switches," including this popular Stack Overflow answer. While that approach is a straightforward solution, involving as little as deploying a few lines of code in a special-purpose service worker file, more comprehensive solutions—involving, e.g., clearing out cached data across various storage mechanisms—are necessarily more involved.

The logic of implementing "kill switch" behavior from the context of the ServiceWorkerGlobalScope is also different from how you'd implement the same thing inside of the WindowGlobalScope (i.e. web app)'s scope, and both types of uses are valid for different scenarios.

For these reasons, it would make sense for Workbox to ship a new module, tentatively named workbox-clear-storage, that could be used either from the ServiceWorkerGlobalScope or the WindowGlobalScope, which would provide a way of both unregistering service workers and providing fine-grained control over clearing out multiple types of storage across the entire origin.

Naming

I'm not keen on using the "kill" terminology as part of a module name. My current thinking is workbox-clear-storage. Alternative names I've considered are workbox-clear-data or workbox-clear-site-data, but the functionality isn't quite the same as what the Clear-Site-Data: header offers, so I don't want to confuse folks.

Proposed API

The API is designed around the idea of giving developers fine-grained control over what gets deleted/unregistered, while defaulting to a maximal "delete as much as possible" approach.

For each type of storage you might want cleared, you'd supply an async predicate function that would determine whether a specific "unit" of that storage should be deleted or not, with that unit's id passed in as a parameter. Each type of storage would have its own id values; for cacheStorage, for instance, the id would be each cache name, while for serviceWorkerRegistrations, the id would be the scriptURL associated with the service worker registration.

The return value of the single public method would summarize what was actually deleted, allowing web applications to take specific actions depending on what ended up being cleared out. For instance, a web app might want to redirect itself to a login page upon completion of the "kill switch" if it sees that there was a specific service worker registration that was previously active and is now removed.

interface AsyncPredicateFunction {
  (id: string): Promise<boolean>;
}

interface ClearOptions {
  cacheStorage?: AsyncPredicateFunction;
  serviceWorkerRegistrations?: AsyncPredicateFunction;
  cookies?: AsyncPredicateFunction;
  indexedDB?: AsyncPredicateFunction;
}

interface StorageToItems {
  cacheStorage: Array<string> | null;
  serviceWorkerRegistrations: Array<string> | null;
  cookies: Array<string> | null;
  indexedDB: Array<string> | null;
}

interface ClearReturnValue {
  cleared: StorageToItems,
  failures: StorageToItems,
}

// For any property in ClearOptions that's undefined,
// default to `async (id) => true`
function clear(options: ClearOptions = {}): Promise<ClearReturnValue>;

The usage would look like:

const {cleared, failures} = await clear({
  // Only delete caches that include "workbox" in their name.
  cacheStorage: async (id) => id.includes('workbox'),

  // Delete all service worker registrations.
  serviceWorkerRegistrations: async (id) => true,

  // Don't delete any cookies.
  cookies: async (id) => false,

  // Omitting indexedDB would be equivalent to the default:
  // indexedDB: async (id) => true,
});

for (const [storageType, items] of Object.entries(failures)) {
  if (items === null) {
    console.warn(`Unable to clear ${storageType} in the current browser.`);
  } else {
    console.warn(`Could not delete the following items from ${storageType}:`, items);
  }
}

if (failures.indexedDB === null) {
  // Manually clean up IndexedDB on browsers that don't support
  // iterating over DB names.
}

if (cleared.serviceWorkerRegistrations.length > 0) {
  // Do something if there was at least one SW registration deleted.
}

if (cleared.indexedDB.includes('my-indexedb-name')) {
  // Do something if there was an IndexedDB with that database
  // name that was deleted.
}

Open questions

Handling missing APIs

Cookies aren't widely available from inside of service workers (modulo this origin trial.) Iterating over a list of IndexedDB databases also isn't widely supported. This means there will be users for whom the module just isn't able to attempt to clear cookies or IndexedDB.

Right now, the idea is to reflect that by returning null for a given storage type in the failures return value. This at least lets the developer know that nothing was cleared out, but it's up to the developer to then follow-up with some manual steps to, e.g., clear cookies via the WindowGlobalScope.

id values

It might make sense to use more complicated id values for certain types of predicate functions. I.e. for serviceWorkerRegistrations, maybe pass in ServiceWorkerRegistration objects or the scope instead of strings corresponding to the scriptURL.

Similarly, what's the right indexedDB value? A database name?

localStorage? AppCache? Other storage mechanisms?

There are probably other types of storage—localStorage and AppCache both come to mind—that we could attempt to clear out as well, but those are mainly used by legacy applications. It's not clear that we should support them in this library.

Failures?

Should the clear() method ever fail? What happens if one of the attempts to clear something, like an IndexedDB database, does not succeed? What happens if an attempt to delete something doesn't complete within a reasonable amount of time?

Presumably it would be better to ensure that the method always fulfills and attempts to delete as much as possible, and then reports back any failures to delete items as part of the return value. This is open for discussion, though.

Relationship with other Workbox libraries

workbox-clear-storage is intended to be usable in a standalone fashion by any web application, regardless of whether its using a service worker that includes the other Workbox libraries.

It would likely share some of the IndexedDB code from workbox-core.

jeffposnick avatar Jul 15 '20 20:07 jeffposnick

There are probably other types of storage—localStorage and AppCache both come to mind—that we could attempt to clear out as well, but those are mainly used by legacy applications.

In my Sapper PWA I'm using a bunch of Firebase products as well as GunDB which, as a default, stores off-line data to localStorage (indexedDB as a config option). In this regard it would be great to also add localStorage to the list of caches that would be cleared out.

evdama avatar Jul 21 '20 06:07 evdama

@evdama: A complication around support for localStorage is that it can't be accessed from inside of a service worker. So if we did support clearing it, that would only work when workbox-clear-storage was used from a window client, not from inside of a service worker.

Maybe that's okay, if folks think they are more likely to use this sort of module from their window client code? But we'd have to make it clear that it will fail when used inside of a service worker.

jeffposnick avatar Mar 18 '21 15:03 jeffposnick

If I understood correctly, this module would allow operations ranging from deleting a single runtime cache, to a complete storage dump, like the one Jake Archibald suggests in the section "Recovering after a successful hack" of this article. Correct? https://jakearchibald.com/2018/when-packages-go-bad/

And I guess that what this module would be able to do should depend on the service worker's scope. Right?

What about deleting data stored in in-memory databases such as LokiJS and datascript? Is this something that could be handled by this module?

Regarding the difficulty of clearing cookies and localStorage/sessionStorage from a service worker, do you think it would be possible - and reasonable - to adopt the following solution?

At build time:

  1. the developer decides what to delete using ClearOptions
  2. on the basis on the configuration specified in 1, workbox-clear-storage builds a small event handler to be executed in a WindowGlobalScope

At runtime:

  1. the user/app decides to clear storage
  2. the service worker uses postMessage to send a CustomEvent to the event handler, which performs the necessary calls to localStorage/sessionStorage/cookies operating from a WindowGlobalScope, not a ServiceWorkerGlobalScope. This would be a way to bypass the current limitations to work with localStorage and cookies in a service worker.

jackdbd avatar Apr 24 '21 14:04 jackdbd