workbox
workbox copied to clipboard
Service worker stuck in busy state when SSE request hits the cache
Library Affected: workbox-routing
Browser & Platform: all browsers
Issue or Feature Request Description: Bug report:
Consider the following service worker
import { clientsClaim, skipWaiting } from 'workbox-core';
import { registerRoute } from **'workbox-routing';
import { StaleWhileRevalidate } from 'workbox-strategies';
skipWaiting();
clientsClaim();
registerRoute(
({ request, url }) => request.url.origin === self.location.origin && /(?<!service-worker)\.js$/.test(url.pathname),
new StaleWhileRevalidate({
cacheName: 'app',
}),
);
When a new service worker exists and installation of the newer worker begins, it would be expected that the use of skipWaiting
would activate the new service worker.
This isn't the case however because the registerRoute function instantiates a fetch handler that never calls waitUntil.
The result of this is that the updated service worker will never activate (stuck in installed
state) until the user's browser session is closed.
Hello @andyrichardson! I am not sure that your diagnosis of what's going on is accurate.
Calling respondWith(responsePromise)
on a FetchEvent
already extends the lifetime of the service worker until responsePromise
settles. C.f. the note in step 4 of the respondWith()
description in the spec:
Note:
event.respondWith(r)
extends the lifetime of the event by default as ifevent.waitUntil(r)
is called.
Calling skipWaiting()
does three things in parallel:
- sets a flag indicating that if the current service worker enters the
installed
state, it should immediately transition to theactivating
state. - tells the service worker registration associated with the current service worker that if there's a service worker that's already
waiting
, it should transition toactivating
. - returns a promise that resolves immediately with
undefined
(not waiting for the other two steps to complete).
If you're trying to debug why a service worker is waiting in the installed
state instead of activating
, it might be because you're calling skipWaiting()
in the incorrect service worker (the old one, not the new one that's actually waiting), or because your code is more complex than the example you included above and you're running afoul of something like https://github.com/GoogleChrome/workbox/issues/2525. Because of that issue, we've stopped wrapping skipWaiting()
in an install
event handler in Workbox v6, and encourage folks to just call self.skipWaiting()
instead.
Hey @jeffposnick, thanks for the response.
Calling respondWith(responsePromise) on a FetchEvent already extends the lifetime of the service worker until responsePromise settles.
Totally with you on this - even without checking the spec, I was under a similar assumption that respondWith
would remove the need for waitUntil
.
I've managed to consistently reproduce this issue on Chrome and Firefox and the one line change in the PR fixed the problem. I'm not totally sure why this would be the case.
I'll spend some time this week creating a reproduction so you can try it out for yourself!
Just to confirm, have you tried creating a service worker which calls registerRoute
and then subsequently try and update that service worker? This has consistently caused the new service worker on to get stuck in the installed state on Chrome and Firefox on my end.
Before you spend too much more time debugging this, can you try a quick swap from using workbox-core
's skipWaiting()
to using self.skipWaiting()
instead, and see if that resolves things? (You could alternatively update to Workbox v6. Following #2525, the two are equivalent.)
The previous implement of workbox-core
's skipWaiting()
wrapped the call to self.skipWaiting()
in an install
event handler, and that ended up introducing edge cases. That's really the only thing that comes to mind as a possible cause of what you're describing.
I had tried calling self.skipWaiting:
- On js load
- On install event
- On message from browser (w/ message added to browser)
But no dice!
The current project I'm on uses whatever is bundled with CRA but I'll try a standalone example without CRA and see if this is still a concern.
That's very strange.
What happens if you go to Chrome's DevTools and manually click on the "skipWaiting" link that you should see next to the waiting service worker?
data:image/s3,"s3://crabby-images/5e9cf/5e9cffde292a7c83e16c4ff21e676e8aae5eb5bc" alt="77b5eaa130252f7e"
That's very strange.
What happens if you go to Chrome's DevTools and manually click on the "skipWaiting" link that you should see next to the waiting service worker?
Manually clicking skipWaiting
or update
does nothing (no error or visual indication of a change). I have to either hit stop
or unregister
to get the new service worker activated.
I got around to creating a simple reproduction (with v5 and v6 package versions) but I'm unable to reproduce the issue in a more basic environment so I'm not sure what the cause is.
If I'm not mistaken, this nesting implies that the service worker has spawned a worker thread - that might be what is keeping the service worker from unloading.
data:image/s3,"s3://crabby-images/d67a6/d67a6d979ce8a0cabaa5946f4c3d142eb3f2d247" alt="Screen Shot 2020-12-08 at 18 28 39"
That spawned worker thread looks to have some relation to buffering
data:image/s3,"s3://crabby-images/d11e2/d11e2285b7f3891540a59f9d30caad145f1b53aa" alt="Screen Shot 2020-12-08 at 18 31 18"
Jeeeeeezzz
Okay I found the cause of the issue.
registerRoute(
({ url }) => /^somefeatureflagservice/.test(url.host),
new StaleWhileRevalidate({
cacheName: `flags`,
}),
);
In my case, I was caching a feature flagging network request for offline support.
This request was a ~websocket~ Server-sent event (SSE) request.
I guess that explains why adding waitUntil
along with respondWith
fixed the issue.
respondWith
doesn't flag the event as closed with a streaming connection (multiple responses are possible) while waitUntil(someProm)
does (once someProm resolves I'm guessing it assumes a connection close).
~Not sure where we go from here - it would be nice to have that waitUntil
addition to cover situations where websocket requests are being cached.~
Edit: Both via the cache and network, SSE requests look to get infinitely stuck in the pending state when using
StaleWhileRevalidate
Is workbox's cache intended to work with SSE/websocket/streaming connections?
If not, do we want to add in a warning / skip the cache altogether so we don't keep a fetch event open indefinitely?
Ah, okay, glad that there's an explanation!
Service workers don't intercept WebSocket requests, but they can be used with the Streams API, and they also explicitly do intercept SSE, as per this thread that @wanderview participated in back in the day. That thread discussed potential edge cases and had an overall consensus that using service workers with SSE was likely to be uncommon.
I'm not ruling out making a change to Workbox if needed, but I'm wondering if there's an underlying issue that would better be addressed by the browser(s). You mentioned both Chrome and Firefox exhibited that behavior—how about Safari?
I'd like to summarize the issue and then pull in some folks for their opinion. Please correct me if I'm wrong about any of this:
-
You have a service worker, using Workbox, whose
fetch
handler callsevent.respondWith(responsePromise)
to generate a response for a SSE. (I'm not clear on whether this was a mistake since you didn't realize it was a SSE, or whether you actually meant to do this.) -
Doing so keeps that service worker alive for an extended period of time. (Presumably until the server closes the connection, though maybe there's a hard time limit imposed by the browser?)
-
If, during the time that the active service worker is being kept alive, a newly installed service worker calls
self.skipWaiting()
, this has no effect. The new service worker remains in theinstalled
state indefinitely, and things appear to be "stuck." -
But, if the active service worker had called
event.waitUntil(responsePromise)
inside itsfetch
handler, immediately prior to callingevent.respondWith(responsePromise)
, then callingself.skipWaiting()
in the newly installed service worker behaves as intended, and things are not "stuck." (This is what https://github.com/GoogleChrome/workbox/pull/2693 added.) -
The service worker spec states that "
event.respondWith(r)
extends the lifetime of the event by default as ifevent.waitUntil(r)
is called.", so I find it unexpected that explicitly callingevent.waitUntil()
changes the behavior.
I would expect the service worker to be killed after 5 minutes of being held open by respondWith() processing an SSE response. (Unless another FetchEvent or something occurred since then of course...)
Note, if you open devtools then chrome will not terminate the service worker due to this timeout.
Thanks, @wanderview. Any thoughts on those final two points, where adding in an explicit event.waitUntil()
prior to the event.respondWith()
leads to different behavior? Does that sound like a browser bug, or am I misinterpreting the service worker spec?
Doesn't make sense from the browser perspective. I don't really know what workbox is doing around where it calls those, though. Maybe its an incorrect association and the real issue was whether devtools had been opened or not, etc.
Gotcha. If the browser technically shouldn't require that explicit event.waitUntil()
, then I'd rather not add it in to Workbox to work around the reported behavior.
@andyrichardson, what if we added additional logging that's enabled in the workbox-routing
development builds which checks for fetchEvent.request.headers.get('accept') === 'text/event-stream'
and warns when workbox-router
attempts to respond to such an event?
I'll wait to hear back from you, but my understanding is that you didn't actually want to have the service worker intercept that SSE, and it's not clear how a StaleWhileRevalidate
strategy could properly handle it anyway—though a different workbox-strategy
, like NetworkOnly
, presumably would be able to handle it.
Hey folks! Thanks for the discussion on this!
You have a service worker, using Workbox, whose fetch handler calls event.respondWith(responsePromise) to generate a response for a SSE. (I'm not clear on whether this was a mistake since you didn't realize it was a SSE, or whether you actually meant to do this.)
Yes that's pretty much it
registerRoute(
({ url }) => /^somefeatureflagservice/.test(url.host),
new StaleWhileRevalidate({
cacheName: `flags`,
}),
);
what if we added additional logging that's enabled in the workbox-routing development builds which checks for fetchEvent.request.headers.get('accept') === 'text/event-stream' and warns when workbox-router attempts to respond to such an event?
In my case, I wasn't even aware that SSE was present on the app - that request was being made by a third party library. I think the sketchy part here is that once we get into that SEE w/ workbox caching state - it's impossible to push a new service-worker (or app in the case of precaching) and you're at the mercy of when the user exits their browser session.
If it's possible to prevent that locked-in condition whether that be by ignoring text/event-stream
requests or by calling waitUntil
(which I would guess would cover all streaming protocols) I think that's the safest option - it doesn't look like SEE can ever work with workbox's cache anyhow.
If not, maybe documentation on filtering out text/event-stream be the second best option.
Side note - CRA for better or for worse only enables service workers in production builds. Obviously there's more to web dev than just CRA but it's worth noting that a lot of users could encounter this issue and not be aware until it hits production.