html icon indicating copy to clipboard operation
html copied to clipboard

Proposal: `beforePutToBFcache` and `afterRestoreFromBFcache` events for DedicatedWorkerGlobalScope

Open hajimehoshi opened this issue 4 years ago • 38 comments

Explainer: beforePutToBFcache and afterRestoreFromBFcache events for DedicatedWorkerGlobalScope

Authors

@hajimehoshi

Introduction

Today, browsers use an optimization feature when users navigate their browser’s history, called Back-Forward Cache (a.k.a BFCache). BFcache enables instant loading experience when users go back to a page they have recently visited.

Not every page can use this optimization. Different browsers have different heuristics that opt the pages out of BFCache when certain features are used by the web page. This feature detection also happens not only in documents but also in Web workers - so if a worker is using a feature that is not compatible with BFCache, the document might not be able to get BFCached.

In a document, web authors can listen for pagehide and pageshow events. These are window's events. pagehide is fired when a navigation happens but before the decision is made whether the page is put in BFcache. pageshow is fired when the page is restored from BFcache by a history navigation. pagehide gives web authors the opportunity to handle features that can affect BFcache. For example, a page with an IndexedDB connection might not be eligible for BFcache in some browsers. In this case, by disconnecting the connection at pagehide, the page can likely be put into BFcache in the browsers. They also can reconnect the connection at pageshow.

In a dedicated worker, there is no way to handle such lifecycle changes so far. This means that it can be difficult to cache pages with a dedicated worker. However, it is not feasible to add pagehide and pageshow events to DedicatedWorkerGlobalScope. As a dedicated worker works in a different thread, the semantics of page lifecycle like pagehide and pageshow doesn't match with dedicated workers. For example, if a dedicated worker's task lives very long, events like pagehide and pageshow might be fired during the worker's task. In this case, it would be semantically incorrect if pagehide and pageshow are fired after the long task.

To improve this situation, this proposes to add beforePutToBFcache and afterRestoreFromBFcache events to DedicatedWorkerGlobalScope. beforePutToBFcache is fired when the browser makes a decision whether the page should be put into BFcache. afterRestoreFromBFcache is fired after the browser restores the page from BFcache.

Goals

This proposes to add new events beforePutToBFcache and afterRestoreFromBFcache to DedicatedWorkerGlobalScope, which gives web authors chances to observe the timing when the associated document is being put into BFcache and react to it. These events are fired only when the dedicated worker doesn't have a shared worker or a service worker in its ancestor chain.

Non-goals

This doesn't propose to add the events to all the workers or the worklets other than a dedicated worker. A dedicated worker, which doesn't have a shared worker or a service worker in its ancestor chain, belongs to one document so it is natural to handle a document lifecycle events in a dedicated worker. However,

  • A shared worker and a service worker are shared by multiple documents, so it is not natural to add such events to them.
    • This is the same if there is a shared worker or a service worker in a dedicated worker's ancestor chain.
  • A worklet belongs to one document like a dedicated worker, then we might be able to add the events to worklets in the future but this is not the goal of this proposal. We should revisit worklets later.

API

partial DedicatdWorkerGlobalScope : WorkerGlobalScope {
    attribute EventHandler onBeforePutToBFcache;
    attribute EventHandler onAfterRestoreFromBFcache;
}

The event beforePutToBFcache is dispatched when the page is navigated out and before unloading. This event is dispatched before the decision is made whether the page is put into BFcache. This is similar to the window's pagehide, but is different. When the browser gives up putting the page into BFcache, the event is not fired. For example, if a dedicated worker's task takes very long, the browser might give up using BFcache.

The event afterRestoreFromBFcache is dispatched when the page is restored from BFcache by a history navigation. This is similar to window's afterRestoreFromBFcache, but is different.

The event type is PageTransitionEvent . The persisted read-only member is always true.

Example

let db = null;

self.onBeforePutToBFcache = (e) => {
  if (e.persisted) {
    // This page is being cached.
    if (db) {
      db.close();
    }
  }
}

self.onAfterRestoreFromBFcache = (e) => {
  if (e.persisted) {
    // This page is being restored from cache.
    let req = indexedDB.open(“foo”);
    req.onsuccess = (e) => {
      db = e.result;
    };
  }
}

Discussion

Why not postMessage from a frame?

Browser side can determine whether the page is cached or not after all the beforePutToBFcache and afterRestoreFromBFcache events are handled. postMessage just notifies the events to dedicated workers asynchronoucly, and browser cannot wait for their postMessage handlings. It is impossible to do such determination with postMessage.

Note that navigation itself happens immediately regardless of whether the page is cached or not.

References

/CC @domenic @nhiroki @rakina @fergald

hajimehoshi avatar Oct 14 '21 07:10 hajimehoshi

One dedicated worker is associated with one document, and a dedicated worker should follow its associated document's lifecycle.

This is unfortunately not true in the spec and in Firefox. As discussed in https://github.com/whatwg/html/pull/6379, shared workers can own dedicated workers, and shared workers have no clear owner document.

We might just say that such shared-worker-owned dedicated workers are out of scope for this proposal, but if so we'd need to be explicit, and ensure that that doesn't have any bad impacts.

In particular it seems like some of your "non-goals" section is based specifically on this assumption, so might need to be re-thought.

the same timing

Could you be a bit more concrete about what you are proposing? In particular since they are in different threads I don't think we can guarantee any order. I guess we would post a task from the main thread into the worker thread that fires the event? Is there any ordering with relation to other interesting worker lifecycle events, or posted messages?

domenic avatar Oct 14 '21 15:10 domenic

Thanks!

We might just say that such shared-worker-owned dedicated workers are out of scope for this proposal, but if so we'd need to be explicit, and ensure that that doesn't have any bad impacts.

Would it makes sense to fire the events in a dedicated worker only when

  • the dedicated worker is not owned by any other workers OR
  • the dedicated worker is owned by another dedicated worker

?

Could you be a bit more concrete about what you are proposing? In particular since they are in different threads I don't think we can guarantee any order. ​I guess we would post a task from the main thread into the worker thread that fires the event? Is there any ordering with relation to other interesting worker lifecycle events, or posted messages?

Yes, I thought the events are fired from tasks that are posted from the main thread. I don't think there are any other lifecycle events in workers so far.

So probably would the sentence "the events are fired from tasks that are ported from the main thread when pagehide and pageshow are dispatched" be fine?

hajimehoshi avatar Oct 15 '21 09:10 hajimehoshi

It feels weird to me for the events to be tied to the existence of a document owner (direct or indirect). I see why "persisted" would only be true if there is a doc owner, but wouldn't we want to always fire them?

Also, would these be fired for worker.terminate() or if the worker script calls self.close()?

wanderview avatar Oct 15 '21 15:10 wanderview

Would it makes sense to fire the events in a dedicated worker only when

This doesn't work as stated because a dedicated worker can be owned by a dedicated worker which can be owned by a shared worker. But I assume you are trying to go for a scenario where there are no shared workers in the ancestor chain, which would work. It's just a little weird as @wanderview calls out.

I don't think there are any other lifecycle events in workers so far.

No events, but there could be posted tasks. E.g. consider

worker.postMessage("1");
window.onbeforeunload = () => worker.postMessage("2"); 
window.onpagehide = () => worker.postMessage("3");
window.onunload = () => worker.postMessage("4");

causePageToUnload();

Is there any ordering guarantee of the pagehide event inside the worker, versus the worker receiving messages 1-4?

domenic avatar Oct 15 '21 15:10 domenic

I see why "persisted" would only be true if there is a doc owner, but wouldn't we want to always fire them?

Do you mean that pagehide / pageshow are called for not every dedicated worker, which seems weird?

Also, would these be fired for worker.terminate() or if the worker script calls self.close()?

I don't think the events should be called in this case, as those are not related to a document's lifecycle.

But I assume you are trying to go for a scenario where there are no shared workers in the ancestor chain, which would work.

Ah yes, that's what I intended. The events are fired for a dedicated worker only when there is no shared worker in its ancestor chain.

Is there any ordering guarantee of the pagehide event inside the worker, versus the worker receiving messages 1-4?

I have never thought that... Hmm I feel like we should guarantee but I'm not familiar with postMessage's behavior. @nhiroki What do you think?

hajimehoshi avatar Oct 15 '21 16:10 hajimehoshi

Do you mean that pagehide / pageshow are called for not every dedicated worker, which seems weird?

Right.

I don't think the events should be called in this case, as those are not related to a document's lifecycle.

It seems to me events fired in a worker should relate to the worker lifecycle, not to an owner that may or may not be present. Otherwise you cannot write worker script code that relies on these events without assumptions about who owns the worker.

At the core we are adding freeze/thaw type concepts to the worker lifecycle. Workers already have the lifecycle concept of "creation" and "destruction". Do those not map to "pageshow" and "pagehide" here?

If we aren't building worker lifecycle events, but just proxing document events into the worker, then what is the benefit of the platform providing that vs userland using postMessage() themselves?

wanderview avatar Oct 15 '21 19:10 wanderview

It seems to me events fired in a worker should relate to the worker lifecycle, not to an owner that may or may not be present.

I think it's OK to fire events in a worker related to something else's lifecycle, especially if they are clearly named as such. The page prefix, IMO, makes it pretty clear.

If we aren't building worker lifecycle events, but just proxing document events into the worker, then what is the benefit of the platform providing that vs userland using postMessage() themselves?

I believe the proposal in the OP is indeed just proxying. It is indeed a good question why postMessage() doesn't work. My guess is because it creates a coordination problem where you need the page author to do the proxying, which makes it hard to rely on in reusable libraries or similar that want to work in a worker. But I'd love to hear more from @hajimehoshi in that regard.

domenic avatar Oct 15 '21 20:10 domenic

I believe the proposal in the OP is indeed just proxying. It is indeed a good question why postMessage() doesn't work

The benefit of pagehide events in dedicated workers is that the browser can decide whether the page can be cached or not after all the pagehide events are done. With postMessage, the browser cannot wait for the dedicated workers' actions before the page is cached. Does this make sense?

Note that the navigation itself can be done immediately regardless of the decision of whether the page is cached or not.

hajimehoshi avatar Oct 18 '21 04:10 hajimehoshi

@domenic Ping

hajimehoshi avatar Oct 19 '21 16:10 hajimehoshi

Yes, this makes sense.

domenic avatar Oct 19 '21 20:10 domenic

@wanderview

At the core we are adding freeze/thaw type concepts to the worker lifecycle. Workers already have the lifecycle concept of "creation" and "destruction". Do those not map to "pageshow" and "pagehide" here? If we aren't building worker lifecycle events, but just proxing document events into the worker, then what is the benefit of the platform providing that vs userland using postMessage() themselves?

So, with this proposal, we are just proxying the message from document events to workers for pageshow and pagehide. postMessage doesn't work as the browser side should wait for the results of the event handlers in dedicated workers before browser side determines to cache the page. Does this make sense to you?

I'll update the proposal to make this point explicit.

hajimehoshi avatar Oct 20 '21 06:10 hajimehoshi

Updated the explainer. Please take a look.

hajimehoshi avatar Oct 20 '21 06:10 hajimehoshi

Understood. I guess it still feels weird to me that we don't have a "context closing" event for workers in general, but we do if the workers just happen to be owned by a document. I don't feel strongly enough to argue the point, though. So no objections from me.

wanderview avatar Oct 20 '21 17:10 wanderview

It seems like the introduction of a pagehide event requires some explicit concept of a grace period for the worker to finish what it's doing and process an explicitly dispatched new task, plus all the previously enqueued tasks that might have to run first under the existing execution model? And this seems at odds with the idea that dedicated workers can be asked to do long-running work.

ServiceWorkers do provide a precedent for letting content continue to run JS after the user has navigated away, but arguably in that case maybe the relevant app logic should just be using a ServiceWorker in which case it wouldn't be under any time pressure to drop IDB connections, etc.?

Maybe it's different for other browsers, but when Firefox freezes a page, the worker is interrupted mid-JS-execution and all content execution stops until thawed. This is based on the same mechanism for worker termination.

asutherland avatar Oct 26 '21 20:10 asutherland

@asutherland

Maybe it's different for other browsers, but when Firefox freezes a page, the worker is interrupted mid-JS-execution and all content execution stops until thawed. This is based on the same mechanism for worker termination.

For a worker not running a long task, giving the worker the chance to release resources that would block BFCaching is a win.

For a worker running a long task that would block timely execution of pagehide, we need to consider 2 cases

  1. the worker is holding resources that block BFCaching. It would not be cached with or without pagehide.
  2. the worker is not holding resources that block BFCaching. a. Without pagehide, we could just freeze and cache it. b. With pagehide, we would attempt to run pagehide and it would not run in a timely manner

So 2b here is tricky. Can we just freeze it anyway and let the pagehide run after it comes back out of BFCache and completes the long tasks? I can't think of a reason why that would be a problem. It seems odd to run pagehide after freezing and unfreezing but the worker won't know.

fergald avatar Oct 27 '21 00:10 fergald

To be clear, I'm on board with the potential benefits of letting workers clean up. My concern is how long the window for "timely execution" has to be before we start avoiding case 2b and what the performance implications of granting this grace period to every dedicated worker will be. Presumably the page the user is navigating to would benefit from having those resources for itself!

If we're regularly going to be sending too-late pagehide events for non-idle Workers that aren't written to run in very tiny time-slices, maybe it would be better for the worker constructor to gain an option like terminateOnPageHideIfNecessary that indicates that in the event that the worker would make the parent ineligible for bfcache that the worker will automatically be terminated and a terminatedByPageHide event dispatched on the Worker that was terminated. Sophisticated worker setups might run the bfcache friendly logic in the parent and the bfcache-angrifying logic in a nested worker with the flag set.

Alternately, I suppose the spec could be written so that browsers could immediately freeze all the workers until the navigated page is sufficiently loaded and the latency pressure is off from their perspective. Then the browser could thaw the workers on its own schedule and give each a longer opportunity to get to process the pagehide event and clean up whatever needs to be cleaned up. If they don't get to processing the event with this longer opportunity, the page and workers are removed from bfcache. This might even be helpful for situations involving storage APIs where the extra delay could have given them time to complete if the "pagehide" event is intentionally queued only on the thaw-for-pagehide so that the storage tasks are allowed to be queued up in the meantime.

asutherland avatar Oct 27 '21 01:10 asutherland

Thinking about the long-task-worker problem, maybe instead of pagehide/pageshow, we should be doing prepareforbfcache/resumeafterbfcache (ignore the terrible names). We would only send them if the worker is blocking BFCache. If the worker is not blocking BFCache, we can just immediately freeze it.

This removes 2 problems

  • @altimin's question of "should we send pageshow to all workers when they are created and pagehide just as they are destroyed?"
  • the issue with 2b above.

It does not solve the fact that in 1 above we would still allow a worker to consume CPU for some grace period (but that's a problem with pagehide/pageshow too).

I'm not sure that we need to spec the ordering of things. Does the current spec demand that the workers be frozen before the next page starts executing? I can imagine problems here if the user navigates to another page on the same origin and the worker is writing to shared storage but I think those problems already exist with freezing workers and resuming them after BFCache.

fergald avatar Oct 28 '21 02:10 fergald

There is a grace period for shared and service workers and I suppose that could be extended to dedicated workers, though given how prevalent they are that does seem worrisome. And if they have a long-running task, how much is a couple extra seconds going to help?

One thing I wanted to ask is that when designing this feature we also keep shared and service workers in mind so that the solution can scale to them (as as well as any dedicated workers they might instantiate).

annevk avatar Oct 28 '21 07:10 annevk

@annevk can you elaborate on the grace period for shared and service workers? What are you waiting for during that grace period? Right now Chrome allows pages with service workers into the cache, it does not signal anything to the shared worker. If the worker asks for clients, it will not be told that the page exists and if it already had a handle to the page and tried to send it a message, it will be evicted.

fergald avatar Oct 28 '21 07:10 fergald

@fergald I meant that unrelated to bfcache shared/service workers get to exist beyond the lifetime of a document for x seconds, in the event that another document appears for which they can also be used. (Firefox's implementation of these things is a bit in flux still due to site isolation. Firefox also evicts bfcache documents that receive a message.)

annevk avatar Oct 28 '21 08:10 annevk

So, are we fine to have dedicated events for bfcache-features rather than pagehide/pageshow, like "beingPutToBFcache" "restoredFromBfcache"?

As service workers don't prevent pages from being cached to BFcache (at least in Chrome), these new events are not needed for service workers.

I'm not sure whether shared workers should be able to treat the events. Now pages using share workers are not cached (at least in Chrome). Should shared workers' feature usage (like IndexedDB) affect the page's eligibility for BFcache in the future? When should a shared worker be frozen? As there will be a lot of discussions about them, I'd like shared workers out of scope from my proposal.

hajimehoshi avatar Nov 04 '21 03:11 hajimehoshi

So, are we fine to have dedicated events for bfcache-features rather than pagehide/pageshow, like "beingPutToBFcache" "restoredFromBfcache"?

My concern isn't so much about the name, but the steps related to dispatching "pagehide" or "beingPutToBFcache". Maybe it would be good to sketch what the spec algorithm would be for dispatching the event?

As service workers don't prevent pages from being cached to BFcache (at least in Chrome), these new events are not needed for service workers.

Yes, it seems like ServiceWorkers would not be involved with the event.

I'm not sure whether shared workers should be able to treat the events. Now pages using share workers are not cached (at least in Chrome). Should shared workers' feature usage (like IndexedDB) affect the page's eligibility for BFcache in the future? When should a shared worker be frozen? As there will be a lot of discussions about them, I'd like shared workers out of scope from my proposal.

Firefox freezes a shared worker when all of the documents in its owner set are frozen. It seems reasonable to me that a SharedWorker would receive the event when there is only one unfrozen document in its owner set and that owner is moving to be frozen. Should a frozen document in bfcache be messaged by the unfrozen SharedWorker, the document will be removed from bfcache/discarded. In Firefox (and presumably any multi-process browsers), everything involving SharedWorkers is inherently async, but should roughly look like the async handling of a dedicated worker, so I think it would be appropriate to consider SharedWorkers at the same time.

asutherland avatar Nov 04 '21 15:11 asutherland

Thanks (and sorry for terribly late reply).

Should a frozen document in bfcache be messaged by the unfrozen SharedWorker, the document will be removed from bfcache/discarded. In Firefox (and presumably any multi-process browsers), everything involving SharedWorkers is inherently async, but should roughly look like the async handling of a dedicated worker, so I think it would be appropriate to consider SharedWorkers at the same time.

I see, considering shared at the same time makes sense. Before updating the explainer, we have to consider how Chrome/Chromium caches pages with shared workers... CC @fergald

hajimehoshi avatar Nov 09 '21 18:11 hajimehoshi

@asutherland the distinction between "pagehide" or "beingPutToBFcache" is not just in the name. "beingPutToBFcache" means that we wouldn't fire these on first pageshow and we wouldn't fire them on pagehide if the page is not going into BFCache. They stop being about what the page is doing and instead are about what's about to happen to the worker (which is driven by what the page is doing but the distinction is far more important for shared workers).

As for shared workers in Chrome, we don't cache them currently. They account for less than .01% (1/1000) of reasons we didn't use BFCache (about half of those are blocked by something else too). Handling them would be complex, so I doubt Chrome will ever implement support for bfcaching them. Webkit removed them, so I expect only FF has them as a practical concern.

I do agree that we should sketch out the dispatch algorithm in the explainer but unless you think we are going to discover a reason to change approach entirely, I'd like to keep the scope to dedicated workers, unless someone from FF wants to collaborate. If firing an event in a shared worker is not going to work, I don't know what would, so I don't think we need to get the shared worker story correct before making progress on dedicated workers.

fergald avatar Nov 10 '21 00:11 fergald

I support the refined semantics for "beingPutToBFCache". And thank you for the explicit restatement of the semantics; those make sense and I benefited from the clarity.

In terms of the SharedWorker and the event, it sounds like we all agree that the event would work for the SharedWorker if we wanted to generalize to that/support that, and that's mainly what I wanted consensus on. Given that only Firefox would support BFCaching of documents using SharedWorkers, I think it likely makes sense to just specify that SharedWorkers make a document ineligible for BFCaching in the interest of webcompat. We can always relax that in the future should there be interest in implementing it across browsers and a belief that it would meaningfully improve successful bfcaching.

And in that case the explainer and subsequently spec only need to deal with dedicated workers. I very much look forward to the next steps of this, thank you!

asutherland avatar Nov 10 '21 02:11 asutherland

I've updated the explainer based on the discussions. I've not come up with better event name...

Please take a look, thanks!

hajimehoshi avatar Nov 15 '21 09:11 hajimehoshi

As there seem no objections against my proposal (except for the name?), I'll make a proposal for the spec. Thank you very much!

hajimehoshi avatar Nov 25 '21 06:11 hajimehoshi

Maybe it's different for other browsers, but when Firefox freezes a page, the worker is interrupted mid-JS-execution and all content execution stops until thawed. This is based on the same mechanism for worker termination.

What about Safari? Does anyone have insights?

hajimehoshi avatar Dec 02 '21 06:12 hajimehoshi

@rakina @domenic @nhiroki

Now I'm trying to find how to patch the current spec (whatwg), and I'd appreciate if you could give insights. My idea is:

  • When a page is going to BFcache (the same timing as pagehide), the browser may send an event beforePutToBFcache to dedicated workers.
    • It's OK if the browser sends the event to all the workers, including a worker that doesn't have a blocking feature. This might be too much but this should not be harmful. It's also OK the browser sends nothing as the event is just a kind of optimization. The page won't be put into BF if some blocking feature is used.
    • The browser might not be able to send the event due to a long-running task. It's also OK. The browser won't put the page into cache in this case, if the browser doesn't want to suspend the worker in the middle of the task.
  • When a page is being restored from BFcache (the same timing as pageshow), the browser send an event afterRestoreFromBFcache to dedicated workers when and only when beforePutToBFcache was sent.

Thanks,

hajimehoshi avatar Dec 09 '21 09:12 hajimehoshi

I don't really understand why there are so many "mays". It sounds like that makes these events completely unreliable, and e.g. a browser that never sends them would be compliant with the spec. That doesn't seem like a useful feature to me. We would also be able to write zero web platform tests.

The original semantics as I understood it was that, whenever we would send pagehide with persisted = true to a document, we would send this new event to all dedicated workers owned by the document. I think we should be able to queue such an event regardless of blocking tasks or optimizations, so that we can have a rigorous and testable feature.

domenic avatar Dec 13 '21 17:12 domenic