html icon indicating copy to clipboard operation
html copied to clipboard

A new API for work during unload

Open domenic opened this issue 10 months ago • 27 comments

What problem are you trying to solve?

It's a known issue that many sites want to perform some work during document unloading. This usually includes writing to storage, or sending information to servers. (Previous discussion: https://github.com/whatwg/html/issues/963.)

Some of the simpler use cases are solved by APIs like fetch(..., { keepalive: true }), or by using synchronous storage APIs like localStorage.

But the more complex cases, such as writing to async storage APIs like IndexedDB, or performing some sort of async operation before the HTTP request, require more work. (Examples of such pre-request async operations include using WebCrypto to hash/encrypt data, or using CompressionStream to compress it.)

What solutions exist today?

The current best solution the platform offers for this is service workers. By sending messages to the service worker,, the service worker can then use its ability to run in the background to perform the appropriate actions. The message can carry along any data necessary to perform those actions, e.g., the not-yet-encrypted-or-compressed payload, or the user's unsaved changes that need to be written to IndexedDB.

However, requiring service workers is pretty heavyweight for this use case. Even if there are no fetch handlers, the disk space consumption of the service worker registration database means this is hard to deploy at scale. And, it uses up a process's worth of memory---not only during unload time, when the work is being performed, but the entire time any document from that origin is open.

Additionally, the developer experience of service worker registration, installation, upgrading, client claiming, etc. is a lot to manage, when the goal is to just run some background code after unload. For example, they require a separate same-origin service worker file being hosted, instead of allowing creation from blob: URLs, which means that libraries for this functionality need to consist of two files, not just one. (And it gets worse if the library needs to be able to integrate with a site's existing service worker script!)

How would you solve it?

We (myself, @fergald, @pmeenan) think there are possible new APIs which could allow sites to perform this kind of processing, but at a lower cost. Here are the ideas we've come up with:

  1. A PageHideWorklet. This would be a special type of worklet which a document would register early in its lifetime. Then, during unload, right after firing the pagehide event, it gets spun up and runs its code. The document would need to synchronously set the data that the worklet plans to consume, either continuously throughout the document's lifetime, or in the pagehide handler (or both). But the worklet could run asynchronously for some amount of time (see below).

    On the implementation level, this could be implemented either with a separate process for the worklet, which gets spun up at unload time, or it could be implemented with an in-process worklet plus some code that keeps the unloading document's process alive, even while stopping its event loop and freeing up most of its resources.

  2. A minor extension to SharedWorker. SharedWorkers are already reasonable tools for this: they don't have heavyweight registrations and persistent state like service workers, and they are allowed per spec to stay alive after document unloading for some time.

    In theory, this could involve no spec changes, just implementation changes to allow shared workers to stay alive for longer. In practice, it would probably be better to include a hint at construction time that this SharedWorker is intended to perform post-unload processing, and so the browser should keep it alive for a longer time. Something like new SharedWorker(url, { usageHint: "after-pagehide-processing" }).

We're currently leaning toward (2), as it seems like a simple extension of what exists today.

With regards to privacy/security, the intent here would be to be exactly as powerful as service workers are today. Today there are implementation-defined limits on how long service workers stay alive after all of the documents from their origin are closed, and different browsers have made different choices for them. (And I believe some have discussed changing these limits over time, or in reaction to other signals.) We would allow these mechanisms to operate for those same implementation-defined amounts of time.

Anything else?

We also considered extendable `pagehide` event handlers, but we don't like them very much. Expand if you want to read more

The idea: inside a pagehide event, event.waitUntil(promise) would allow you to extend the document's lifetime and continue running JavaScript, while the document unloads. This would be up to some implementation-defined limit (per above).

In parallel, the user agent would be loading the new document. This would not block loading the new document in any way: even if the new document completely finishes before the promise from the old document settles, we could visibly swap in the new document, while the old document continues running. It is sort of like keeping the old document in bfcache, except JavaScript continues to run. Chrome already has this sort of document state for ~3 seconds for some cross-process navigations, and I believe other implementations might as well.

This is probably the most convenient option for web developers, as they can colocate all their code into the pagehide handler. But keeping unloaded documents alive in that way, even with opt in, is scary. And probably the memory consumed by the document, with all the DOM and JS objects its built up throughout its lifetime, is quite high.

domenic avatar Feb 05 '25 05:02 domenic

It is a bit unclear why a worklet would be used and not a (temporary) dedicated worker. IDB and fetch etc are after all defined to work in workers.

Do we need SharedWorker?

smaug---- avatar Feb 07 '25 13:02 smaug----

A dedicated worker is pretty tightly tied to its owner document's lifetime. Are you thinking we could somehow loosen that, instead of using shared workers?

I thought using shared workers would be easier, both from a spec and implementation point of view, since they are already separate from any single document's lifetime. But maybe that's not necessarily the case?

domenic avatar Feb 10 '25 05:02 domenic

Colleagues and I are rather wary of the longish timeout service workers have today and as I understand a big reason service workers are even kept alive is to reduce the cost of the ongoing navigation. This was discussed quite a bit as part of the fetchLater() work (which is still ongoing). I guess this is not meant to replace that?

annevk avatar Feb 10 '25 13:02 annevk

SharedWorker seems least bad for this use-case[1], especially since MessagePorts can now generate close events which provides symmetry to the SharedWorkerGlobalScope "connect" event and this is conceivably something sites could already be doing.

1: In particular, I agree that a pagehide event with a waitUntil for a document would have terrifying lifetime implications. And a PageHideWorklet would be at odds with worklets currently not using tasks.

asutherland avatar Feb 10 '25 21:02 asutherland

Hmm, that close event still has a pretty severe unfixed bug: #10201.

annevk avatar Feb 11 '25 08:02 annevk

Colleagues and I are rather wary of the longish timeout service workers have today and as I understand a big reason service workers are even kept alive is to reduce the cost of the ongoing navigation.

Understood. I tried to address this in the OP by suggesting that this new mode for shared workers would be subject to whatever implementation-defined limits a browser places on service workers today.

Basically, we should not make this new mode any worse than service workers, as otherwise web sites will need to continue to use service workers for this use case.

I guess this is not meant to replace that?

Correct. They apply to related but separate use cases. fetchLater() is an upgrade over fetch()-in-unload because it can be made more reliable by setting up the fetch ahead of time, and putting more in the browser's hands.

But, there are cases where neither fetchLater() nor fetch()-in-unload can work today:

  • Non-fetch use cases, like async storage (e.g., writing to IndexedDB)
  • Cases where async steps (like encryption or compression) are required before fetching.
    • In some cases you can try to do these async steps before calling fetchLater(), but there's a chance the page will be unloaded during your async steps, and then you'll lose the data.

domenic avatar Feb 12 '25 04:02 domenic

Let's tentatively call this API new SharedWorker(url, { extendedLifetime: true }).

What do we do if multiple clients have mismatched values of extendedLifetime? Some possibilities:

  1. If >=1 new SharedWorker() invocation includes { extendedLifetime: true }, then we treat the shared worker as having an extended lifetime. Basically, any client can extend the lifetime at any time.

  2. A shared worker's extended lifetime is specified as being relative to the clients that requested that extended lifetime. If page A requests extended lifetime, and page B doesn't, and then page A disappears, and then 10 minutes later page B disappears, the worker shuts down immediately, since page B did not care about extended lifetime.

  3. Only the first new SharedWorker() invocation controls the lifetime. The 2nd onward have extendedLifetime ignored. (And maybe we should log a console warning explaining that it was ignored.)

  4. Only the first new SharedWorker() invocation controls the lifetime. All others have to match, and if they don't match, we throw an exception or fail the worker creation.

For our use case, any of these will work. We expect people to be using a specific shared worker for unloading purposes, and always calling with { extendedLifetime: true } in their pagehide handler.

(1), (3), and (4) are pretty easy to implement. (2) adds a bit more complexity, but is kind of theoretically nice in some ways.

(4) might be simplest to start with since it can evolve into any of the others later.

Edit: @yoshisatoyanagisawa reminded me that we error the worker if type or credentials options mismatch. https://html.spec.whatwg.org/#dom-sharedworker step 11.4. So going with (4) initially seems like an especially good idea now.

domenic avatar Apr 10 '25 04:04 domenic

Thanks @domenic for listing the possible behavior on the option mismatches. I also come up with the other corner case while I am checking the Chromium code.

extendedLifetime aims to extend its lifetime after all clients have been unloaded. If a new client gets created after all clients unloaded and SharedWorker is running due to extendedLifetime is true. What is an expected behavior for this?

  1. connect to the existing SharedWorker and extend its lifetime until a newly added client unload + extendedLifetime duration.
  2. create a new SharedWorker instance, and leaving the existing SharedWorker to be destructed after the duration.

Focusing on the use case mentioned in https://github.com/whatwg/html/issues/10997#issue-2831961444, Option 2 might be enough. However, considering the case using the extendedLifetime option to avoid SharedWorker creation after the navigation, Option 1 can be preferred by web developers.

yoshisatoyanagisawa avatar Apr 10 '25 08:04 yoshisatoyanagisawa

Good catch. I agree (1) seems nicer and less wasteful. If there are implementation/architecture reasons why it is especially hard, then we could consider (2), since we don't have strong use cases for (1) behavior. But I would default to (1) if possible.

domenic avatar Apr 11 '25 01:04 domenic

@domenic 4 above (exceptions if the lifetimes don't match) means that changing from non-extended to extended or vice versa is tricky and maybe impossible for sites that people keep open constantly (e.g. gmail, calendar, facebook).

fergald avatar Apr 21 '25 07:04 fergald

@fergald Can I ask you to elaborate more on the situation?

I think the keys to look up SharedWorker are:

  • SharedWorker script URL,
  • name field in the option,
  • storage key,
  • and first-party/thrid-party context (if in SameSiteCookies experiment)

Then, I believe a new SharedWorker is unlikely to match with the existing SharedWorkers because it might have a different script URL. If site owners want to migrate their existing SharedWorker to the extendedLifetime SharedWorker, I guess they can set a different script URL and/or name to avoid the exception. I did not think the limitation is too strict.

yoshisatoyanagisawa avatar Apr 23 '25 10:04 yoshisatoyanagisawa

@yoshisatoyanagisawa Changing the URL would mean that there can be a period where 2 different SharedWorkers exist are active. This could be a problem if the SharedWorker manages some global state and this state is also involved in the extended lifetime task. I don't know what people do with SharedWorkers in reality so maybe this is not a real concern.

fergald avatar Apr 24 '25 03:04 fergald

@fergald I feel that it can be a general issue on updating a SharedWorker script even without the extendedLifetime support. However, since matching does not do a script byte-to-byte comparison like ServiceWorkers, there can be an update that only happens after all clients have gone. The proposal breaks that. Let me go with this limitation (i.e. prevent mixing extendedLifetime status), and revisit upon the real world feedback on OT.

yoshisatoyanagisawa avatar Apr 24 '25 08:04 yoshisatoyanagisawa

I've created an explainer for this change at https://gist.github.com/domenic/c5bd38339f33b49120ae11b3b4af5b9b, to make horizontal review easier. But, let's keep using this issue as the discussion space.

domenic avatar May 09 '25 07:05 domenic

@wanderview had a few responses and questions to the explainer.

We propose that extended lifetime shared workers with no clients stay alive for exactly the same amount of time that a service worker stays alive, when it has no clients.

It seems like this would double the amount of time that a site could do background processing without a client window. Consider, if there is a server worker controlling the shared worker then you get 1) the extra time for the shared worker and then 2) after the shared worker dies you get the current service time with no clients.

That is not intended! We should specify that extended lifetime shared worker clients do not "count" for determining the service worker lifetime. @yoshisatoyanagisawa, do you agree?

Is the extra time guaranteed?

No, no more than it is guaranteed for service worker.

Does this make launching SharedWorker on android harder?

I guess this is a Chromium-specific question. No, this proposal provides a motivation for launching SharedWorker on Android, and so makes launching it easier.

domenic avatar May 13 '25 04:05 domenic

We should specify that extended lifetime shared worker clients do not "count" for determining the service worker lifetime.

It seems like only windows should grant lifetime. Dedicated workers automatically effectively get lifetime because they are ultimately rooted in a window. SharedWorkers get effective lifetime when attached to a window.

wanderview avatar May 13 '25 13:05 wanderview

It seems like only windows should grant lifetime. Dedicated workers automatically effectively get lifetime because they are ultimately rooted in a window. SharedWorkers get effective lifetime when attached to a window.

I agree with this, but we might want to formalize and reuse what I believe browsers may already be doing to handle the situation where ServiceWorker A calling ServiceWorker.postMessage() to a ServiceWorker B where B is not yet running can propagate the current lifetime deadline of ServiceWorker A to ServiceWorker B. Or at least, that's what we do in Gecko now and we were under the impression that was what Blink had been doing. I do see the Blink bug on "Support navigator.serviceWorker in WorkerNavigator" is currently still open but with much recent activity, but that might be orthogonal.

This would let us address the situation where:

  • The SharedWorker invokes ServiceWorker.postMessage on a ServiceWorker that is not currently running.
  • The SharedWorker is controlled by a ServiceWorker that is not currently running and the SharedWorker performs a fetch which should dispatch a "fetch" event on the ServiceWorker.

In that case, specifying the following could work:

  • A SharedWorker with extendedLifetime is given a lifetime deadline/grant based on when its last owning window goes away (and not based on when the SharedWorker is told about it to avoid attempts to extend lifetime by clogging up the task queue that might delay such a notification).
  • The SharedWorker can propagate its lifetime to the relevant ServiceWorkers, with implementations being able to make a call about whether there's enough lifetime left to both spinning up a ServiceWorker. (Firefox currently requires there to be 5 seconds left on the clock but it's arbitrary.)

asutherland avatar May 13 '25 21:05 asutherland

That is not intended! We should specify that extended lifetime shared worker clients do not "count" for determining the service worker lifetime. @yoshisatoyanagisawa, do you agree?

Yes. That makes sense to me.

I initially failed to understand the situation because SharedWorker is only exposed to Window. https://html.spec.whatwg.org/multipage/workers.html#shared-workers-and-the-sharedworker-interface

However, I guess the scenario is that SharedWorker registers ServiceWorker. It is currently not possible on Chromium, but it will be possible when https://issues.chromium.org/issues/40364838 is resolved. Under the situation, do we allow ServiceWorker to extend SharedWorker lifetime?

I think we agree to say no to the question.

For lifetime propagation, Chromium may not have such mechanism, and I am afraid it makes things too complex. Is there a use case lifetime propagation gets mandatory?

yoshisatoyanagisawa avatar May 14 '25 03:05 yoshisatoyanagisawa

However, I guess the scenario is that SharedWorker registers ServiceWorker.

In theory a window could register a ServiceWorker with a scope that matches the SharedWorker script URL. According to the spec that should result in a ServiceWorker-controlled SharedWorker.

wanderview avatar May 14 '25 14:05 wanderview

I had an idea while reviewing this with the TAG, and I wonder what obvious problems I'm missing. :) Basically, why does the extended lifetime have to be a property of the SharedWorker instance and not the operation it's running? Imagine something like the extendable pagehide event handler but as a method on the SharedWorker:

async function encryptAndSend(cryptoKey, analyticsData) {
  const encryptedData = await crypto.subtle.encrypt(
    { name: "AES-GCM", iv: new Uint8Array(12) }, 
    cryptoKey, 
    analyticsData
  );
  fetch("/send-analytics", { method: "POST", body: encryptedData, keepalive: true });
}
extendLifetimeUntil(encryptAndSend(cryptoKey, analyticsData));

You'd keep the extension limit from the current proposal, but once the Promise resolves, the UA would be free to kill the SW immediately instead of always needing to keep its memory allocated until the time limit.

jyasskin avatar May 20 '25 17:05 jyasskin

We need to know about the SharedWorker potentially outliving the page at the moment we're creating the SharedWorker or it's quite possible the SharedWorker will never stay alive long enough to run any JS code that could call extendLifetimeUntil.

For ServiceWorkers, the ExtendableEvents all are coming via whatever decided to Fire the Functional Event and so there's an implicit lifetime grant that lasts through the end of the task dispatching the event. While the content page in the worker can then use waitUntil to request that lifetime be extended, that's structurally more like delaying an "all done" message rather than sending a message to the owner saying "please extend my lifetime", with the latter being race-prone.

One could refine what you're proposing to create something like SharedWorker.postExtendableMessage which then would look suspiciously like ServiceWorkers, but it would still be really helpful from an implementation perspective to know the SharedWorker is going to have special semantics when it's created.

asutherland avatar May 21 '25 05:05 asutherland

We could say that getting a message (or starting up) always preserves the Shared Worker for the single task it takes to handle the message, even if the page unloads right away. That almost certainly falls into the existing permission to preserve it until the next page loads.

jyasskin avatar May 21 '25 14:05 jyasskin

  1. Only the first new SharedWorker() invocation controls the lifetime. All others have to match, and if they don't match, we throw an exception or fail the worker creation.

We've chosen route (4) for now, as this matches the behavior of other options to the SharedWorker constructor like type or credentials.

I agree you've chose something that is technically consistent with the handling of the existing options, but I don't think extendedLifetime mismatches are as fatal as type/credentials would be, so I'm not sure we need to be consistent with their strictness. type/credentials observably impact what kind of script environment you get in a way that's incompatible with Worker requests with non-matching options. On the other hand, extendedLifetime doesn't impact how the server serves the script, or any other intra-environment characteristics that would be bad to expose to developers requesting a mismatched extendedLifetime worker. This makes me think we can get away with a more lenient mismatch policy, which could make this feature more useful—for example there's no reason to reject a request for a non-lifetime-extended Shared Worker when one exists, right?

What if when an extended-lifetime Shared Worker exists, and a request for a non-extended one comes it, we grant it. And if a request for an extended lifetime worker exists, we do one of:

  • Reject it, if it's indeed as important as @asutherland says it is to know at start-up time whether a Worker is "extended"
  • Extend the lifetime of the existing non-extended worker, if we don't need to know at start-up time whether a Worker is "extended"
  • Create a new extended worker, to fulfill the request

It just seems like such a shame if you request an extended lifetime worker and get an exception, just because some other script (that doesn't care about its lifetime) happened to new SharedWorker() first.

domfarolino avatar Jun 01 '25 20:06 domfarolino

Yeah, if we knew of realistic cases where scripts might not coordinate and somehow decide to create the same shared worker but with different choices for extendedLifetime, then I agree it might be worth loosening it. But I can't think of any such cases: certainly not the cases envisioned in the OP.

domenic avatar Jun 02 '25 01:06 domenic

Yeah, if we knew of realistic cases where scripts might not coordinate and somehow decide to create the same shared worker but with different choices for extendedLifetime, then I agree it might be worth loosening it.

If a site wants to migrate a ShW from being non-extended to extended and the user has tabs running the old and new version of the site, then it will fail. It may not be possible to just change the URL and run them in parallel as there may be something in the existing ShW that should be a global singleton. I think I raised this in another thread somewhere but I can't find it. My recollection is that it was a considered a valid problem.

fergald avatar Jun 02 '25 04:06 fergald

I believe we discussed in the comment at https://github.com/whatwg/html/issues/10997#issuecomment-2826782661 that we would decide on the conclusion for this issue after reviewing the feedback from OT.

yoshisatoyanagisawa avatar Jun 02 '25 07:06 yoshisatoyanagisawa

In reply to https://github.com/whatwg/html/issues/10997#issuecomment-2927853183

What if when an extended-lifetime Shared Worker exists, and a request for a non-extended one comes it, we grant it. And if a request for an extended lifetime worker exists, we do one of:

* Reject it, if it's indeed as important as [@asutherland](https://github.com/asutherland) says it is to know at start-up time whether a Worker is "extended"

Just to clarify, my concern here is a logistical one related to the situation where the SharedWorker is being created close to the teardown of the window that would own the SharedWorker instance. If IPC/messaging-channels are being used that are torn down in a way that causes their messages to never to be be propagated to or processed at their destination as an optimization, then we potentially have to take special action when initially sending the construction request.

The situation here sounds like one where:

  • The non-extendedLifetime SharedWorker-holder global must inherently have created the SharedWorker earlier and presumably will continue to exist
  • The extendedLifetime SharedWorker-creation may be happening from a doomed global, but we'd know it at creation time, so we can take the special action.

So my concern isn't really a factor here.

That said, I agree with @domenic that conforming to the existing option 4 behavior seems preferable. It's easier to reason about, specify, and test for all involved.

The global-singleton use-case @fergald raises is an important one, but Web Locks have excellent webcompat and provide a means of ensuring exclusive temporal ownership in the way the SharedWorker constructor cannot.

In particular, the closing flag will hide the worker from the constructor's matching logic, but there are no hard deadlines for user agents to terminate the worker once closing is set, it's just a may so anything that's depending on SharedWorker to provide a singleton guarantee is some combination of unsound and depending on undefined browser-specific behavior.

asutherland avatar Jun 03 '25 00:06 asutherland

Just FYI, but Chrome will start the origin trial from M139. https://gist.github.com/domenic/c5bd38339f33b49120ae11b3b4af5b9b?permalink_comment_id=5665665#gistcomment-5665665

yoshisatoyanagisawa avatar Jul 09 '25 08:07 yoshisatoyanagisawa

I was thinking about WebLocks in the context of SharedWorkers from the BFCache POV when it occurred to me that something similar happens for extended life ShWs. It's not an insurmountable problem but it's a potential footgun that we should be explicit about. Consider the following:

  • Extended-life ShW takes a WebLock (or some other exclusive resource) at the request of one of its documents
  • All Documents with this ShW are destroyed (there's no guarantee they got to run their various handlers to completion)
  • The lock is now held until the ShW is destroyed
  • The user cannot fix this by just closing all windows from site.com, they must do that AND wait NN minutes (with no real way of knowing this).

Perhaps the answer to this is "don't do that", i.e. never hold locks in an extended life ShW where releasing the lock is conditional on a document's action. Maybe that's also a good rule for regular ShWs.

An alternative would be to say that an extended-life ShW holding a WebLock (or other exclusive resource) has a much shorter extended life.

Anyway, I think we should explicitly acknowledge this issue and include guidance.

fergald avatar Jul 23 '25 03:07 fergald

The explainer is missing some details on the interaction with BFCache.

  • What should happen if all of the extended-life ShW's documents go into BFCache? Does the extended life timer start ticking from when the last active document becomes inactive?
  • Does an inactive document keep the ShW alive past it's normal 5 min? If not, do we need to evict those documents so that they don't get restored missing their ShW?

fergald avatar Jul 23 '25 08:07 fergald