navigation-timing
navigation-timing copied to clipboard
workerStart and redirects
This addresses a few issues around workerStart
especially in the case of redirects:
-
workerStart
needed to be added to the diagram https://github.com/w3c/navigation-timing/issues/128 (see screenshot below)- I've added a new Worker Startup phase that happens prior to Redirect
- I've added a
workerStart
timestamp before the start of the Worker Startup phase - For clarity, I've added a Cross Origin Workers & Redirects section prior to the new Same Origin section
-
workerStart
definition was cleaned up. Notably, it's:-
0
for no SW (same as before) - if redirects, the startup of the first request in the final same-origin redirect chain (new to address https://github.com/w3c/navigation-timing/issues/99 and https://github.com/w3c/navigation-timing/issues/100)
- if it's already available (navigating between two docs on same origin), the time before
fetch
(same as before) - otherwise, SW startup time (same as before)
-
- In the processing model, added a new
worker-start-step
that is split out from the step that was settingunloadEventEnd
-
workerStart
was zero-ed out in the case of same-origin redirects -- it should be the startup of the worker from the first request in the final same-origin redirect chain https://github.com/w3c/navigation-timing/issues/128#issuecomment-674441549- Same-Origin redirects no longer zero out
workerStart
, and will still jump back tofetch-start-step
so it doesn't overwrite the worker startup time for the same origin
- Same-Origin redirects no longer zero out
-
workerStart
was the value of the cross-origin fetch during cross-origin redirects -- it should be0
(or updated by later same-origin SWs) https://github.com/w3c/navigation-timing/issues/128#issuecomment-674441549- The processing model keeps track of last origin
workerStart
was set for, and will resetworkerStart
if the origin ever changes.
- The processing model keeps track of last origin
-
- The Same-origin check was missing a case where a same-origin no-redirect navigation was returning
"fail"
, so added a new step to check for no redirects and return"pass"
. - Adds
workerStart
to the list of things that NavTiming2 has over NavTiming1
Current diagram:
I will also be reviewing the current WPTs to lock-in this behavior, assuming we all agree to the above changes.
I also need to review how workerStart
is defined in https://www.w3.org/TR/resource-timing-2/to make sure it's compatible (or update that too).
Today, there can be delays between fetchStart and workerStart in implementations. Does workerStart need to be clearly defined as possibly after fetchStart in the diagram and spec or is the intent to change that?
Oh hi Todd :) I think the diagram is supposed to imply that, but it should probably be made clearer by moving workerStart
up to be next to fetchStart
@nicjansma did you accidentally remove the PR preview link from the first comment? I was hoping to look at the new diagram since I just realized I commented on the old one.
My point is that I think we DO want a space in between fetchStart and workerStart but the diagram doesn't show that. When I see a delay between fetchStart and workerStart, I've historically recommended that the site should consider not blocking load on the serviceworker if the design can allow for it so having the gap is valuable.
I haven't considered what to name that bucket of time but the goal is to allow all time to be measured and broken down so a web page author can either communicate with the browser vendor OR improve their own site to improve E2E timing.
@npm1 Yeah, I had to remove the Preview link as it was stopping me from making edits to the comment for some reason. I've added it back in.
(I didn't realize what those links did initially, that's very useful!)
@toddreifsteck I think workerStart
is always before fetchStart
, according to the current model and UA behavior, correct? We define fetchStart-workerStart
as the "Service Worker Startup Time" in mPulse for example.
Agreed that we can help make that more clear. Right now I've put workerStart
on the bottom, pointing at the roughly the same time as fetchStart
. I could instead move it to the "top row", but before fetchStart
.
Alternatively, I could add a new "phase" called "Service Worker Startup Time" (or just "Worker" or whatever), to make it even more clear.
Discussions from the W3C call on 9/24:
-
I'll update the SVG graphic to include a new "Worker" phase right before "AppCache" to make it clear
-
We realized if there are any same-origin-redirects,
workerStart
becomes useless for calculating "Worker Startup Time" because the time of redirects could not be excluded.
For example:
- starting on a.com, navigate to a link on b.com/1 (which has a worker)
- b.com starts up a worker (
=workerStart
per this new proposed model) - b.com worker fetches b.com/1
- b.com/1 responds with a redirect to b.com/2
- b.com worker fetches b.com/2 (
=fetchStart
) - navigation ends on b.com/2
In this case, redirectStart/End/Count
are 0
because of same-origin policy (started on a different origin).
workerStart
is at the beginning of the same-origin redirect chain (step 2), but we don't necessarily know it's a redirect (because SOP). However, fetchStart-workerStart
is not just the worker startup time -- it includes the redirect time as well (and we can't know that it is a redirect).
Instead, if we kept the current model where workerStart
is just the startup time of the final document fetch (between step 4/5 above), then workerStart
isn't really the "startup" time since it's already running -- it's just the time the worker takes right before it fetches b.com/2 in step 5.
I think it's important to try to measure "Worker Startup Time" for sites that have Service Workers deployed, and this time can often be seen to be a measurable amount (>50ms in some cases). If so, should we add a new timestamp at the end of SW startup? Maybe workerStartComplete
(?) because the worker isn't ending, just the startup is completing?
Then workerStart
and workerStartComplete
can be the worker startup time (workerStartComplete-workerStart
). For redirects, it would be the last time it started up (step 2 above).
Thoughts?
Yea, this makes sense, I think this is what they're calling workerReady
in https://github.com/w3c/resource-timing/issues/119. Are you thinking on adding a new parameter in this PR though? Or just updating the image and the normative text so that it's more aligned with what it's supposed to be, and then later have a separate PR for the new attribute
Ahh great, thanks for pointing me to that one.
I'll add workerStart
to the diagram here, then followup with a separate proposal (and again updated image) for workerReady
.
Let us know when you'd like another review on this
@npm1 OK I think everything's all set now. I've updated the original description with the current changes too.
After this is merged, I can take a stab at workerReady
e.g. https://github.com/w3c/resource-timing/issues/119
FYI @makotoshimazu and @mfalken.
@nicjansma I think workerStart can't be before fetchStart on the very first navigation to a web page, can it?
I'm also unsure if it is before or after fetchStart when navigation preload is used. https://w3c.github.io/ServiceWorker/#service-worker-registration-navigationpreload
For a page with a service worker installed, it does seem that serviceWorkerStart could be the first event but I always believed the intent of fetchStart was to mark when the UA determined the fetch algorithm should be processed on a URL and to clearly show when unload is complete and the next phase is starting.
If discussions have shown that isn't how the metrics are implemented or are used, please accurately specify them and don't block on my gut belief. 👍 :)
Apologies for missing this. I see https://github.com/w3c/resource-timing/issues/119 has been linked to which explains some of the issues around here. As that issue notes,fetchStart-workerStart
to measure startup time is currently broken as specified: fetchStart is always before workerStart.
I'm wondering about the decision to use the first request of the last same-origin redirect chain for workerStart
. What if that request was not in the scope of a service worker, whereas a later request was? In that case workerStart would be 0?
I wonder if it's more consistent to just always use the final request for workerStart
.
@toddreifsteck:
@nicjansma I think workerStart can't be before fetchStart on the very first navigation to a web page, can it?
For the scenario where a visitor has never been to a site before, and thus the browser does not have an active Service Worker registration? In that case, workerStart
would be 0
via this text:
If the current document has no active service worker registration [SERVICE-WORKERS], this attribute MUST return zero.
Or are you talking about another scenario?
@toddreifsteck:
I'm also unsure if it is before or after fetchStart when navigation preload is used. https://w3c.github.io/ServiceWorker/#service-worker-registration-navigationpreload
Great question, nothing in this spec deals with Navigation Preload. Do you want to file a separate issue to track that?
@toddreifsteck:
For a page with a service worker installed, it does seem that serviceWorkerStart could be the first event but I always believed the intent of fetchStart was to mark when the UA determined the fetch algorithm should be processed on a URL and to clearly show when unload is complete and the next phase is starting. If discussions have shown that isn't how the metrics are implemented or are used, please accurately specify them and don't block on my gut belief
Yeah I think before we acknowledged the existence of Service Workers in this spec (and in RT), fetchStart
was the best "starting place" for the current document's fetch timings. i.e. it's after all of the previous page's unloading and any redirects. With workerStart
, we realized that there may be some "bootup" time in the SW before the actual fetch is dispatched, so it was placed before fetchStart
in those cases. In practice all current browsers (Chrome, FF) that support this show workerStart
before fetchStart
@mfalken:
Apologies for missing this. I see w3c/resource-timing#119 has been linked to which explains some of the issues around here. As that issue notes,fetchStart-workerStart to measure startup time is currently broken as specified: fetchStart is always before workerStart.
Thanks for sharing that. If I understand it correctly:
- RT's current spec has
workerStart
afterfetchStart
- NT's current spec has
workerStart
beforefetchStart
- This PR keeps the NT spec with
workerStart
beforefetchStart
- Chrome in ~2018 may have had
workerStart
afterfetchStart
, but that was a bug and as of right now, Chrome and Firefox both haveworkerStart
beforefetchStart
- In that thread and this thread, we think
workerStart
should be beforefetchStart
@mfalken:
I'm wondering about the decision to use the first request of the last same-origin redirect chain for workerStart.
The intent was to be able to capture the "worker bootup" or "worker startup" time for a domain, i.e. before a domain handles its first request. We were hoping the first request in the chain got us that. If we were to (re)set workerStart
to be the last request in the chain, then there should be little/no reported worker startup time because it had already started up for the first request.
@mfalken:
What if that request was not in the scope of a service worker, whereas a later request was? In that case workerStart would be 0?
That's a good question, we don't discuss scope at all. My assumption is the SW is not "boot"ed if the scope doesn't match in that first request, right? So in that scenario the SW would "boot" for the second+ request, and that bootup time would only be reflected in the redirectEnd-redirectStart
duration but not as a separate timestamp.
Do you think we should clarify the processing model to be something like startup of the first request **that is in scope of a service worker** in the final same-origin redirect chain
or something? Starts getting more complicated...
@mfalken:
I wonder if it's more consistent to just always use the final request for workerStart.
But I think in that case the cost of the SW bootup time will always be "0"ish any time there's a redirect.
And regardless, if we really want to be able to measure SW bootup time if there are redirects, we need a workerReady
timestamp as that thread proposes, or the redirects will be part of fetchStart-workerStart
.
Above all else I think we all want to try to make the NT and RT definition and processing model align everywhere that's possible. So if we want these (and/or more) changes in NT we should also have agreement they belong in RT as well.
I think the new definition of workerStart will allow sites to measure the overhead of Service Worker startup on the main page as the gap between fetchStart-workerStart and they can measure the total time as responseEnd-workerStart if a SW is involved so this seems to solve that problem at a high level.
I don't know how many sites use Navigation Preload but the spec should handle it cleanly. Please open an issue if you believe it is worth tracking. I'm not active in spec work in my new role.
Sorry for the delay, I took some days off.
@mfalken:
Apologies for missing this. I see w3c/resource-timing#119 has been linked to which explains some of the issues around here. As that issue notes,fetchStart-workerStart to measure startup time is currently broken as specified: fetchStart is always before workerStart.
Thanks for sharing that. If I understand it correctly:
- RT's current spec has
workerStart
afterfetchStart
- NT's current spec has
workerStart
beforefetchStart
I may be missing something, but the two specs seem to have workerStart
after fetchStart
. NT says workerStart
is when the service worker was started up, or when the fetch event was dispatched. And it says fetchStart
is the entry point to the Fetch spec ("immediately before a user agent starts the fetching process" is the clause that applies for service worker interception, I think). Worker startup and event dispatch happens in the course of the Fetch spec, so workerStart
would be after fetchStart
.
- This PR keeps the NT spec with
workerStart
beforefetchStart
- Chrome in ~2018 may have had
workerStart
afterfetchStart
, but that was a bug and as of right now, Chrome and Firefox both haveworkerStart
beforefetchStart
- In that thread and this thread, we think
workerStart
should be beforefetchStart
@mfalken:
I'm wondering about the decision to use the first request of the last same-origin redirect chain for workerStart.
The intent was to be able to capture the "worker bootup" or "worker startup" time for a domain, i.e. before a domain handles its first request. We were hoping the first request in the chain got us that. If we were to (re)set
workerStart
to be the last request in the chain, then there should be little/no reported worker startup time because it had already started up for the first request.@mfalken:
What if that request was not in the scope of a service worker, whereas a later request was? In that case workerStart would be 0?
That's a good question, we don't discuss scope at all. My assumption is the SW is not "boot"ed if the scope doesn't match in that first request, right? So in that scenario the SW would "boot" for the second+ request, and that bootup time would only be reflected in the
redirectEnd-redirectStart
duration but not as a separate timestamp.Do you think we should clarify the processing model to be something like
startup of the first request **that is in scope of a service worker** in the final same-origin redirect chain
or something? Starts getting more complicated...
This makes sense. I think the "in scope of a service worker" would be a worthwhile clarification... or rather it should something like "the first request that is in the same scope of the FINAL in-scope request that is same-origin to the final request in the redirect chain". Suppose there are two scopes: a.test/scope1
and a.test/scope2
, and the redirect chain is a.test/scope1/page1
-> a.test/scope2/page2
-> a.test/scope2/page3
. This would boot up a SW at scope1
and then another one at scope2
. I think we want to capture the scope2
SW startup time. But this can be follow-up.
@mfalken:
I wonder if it's more consistent to just always use the final request for workerStart.
But I think in that case the cost of the SW bootup time will always be "0"ish any time there's a redirect.
And regardless, if we really want to be able to measure SW bootup time if there are redirects, we need a
workerReady
timestamp as that thread proposes, or the redirects will be part offetchStart-workerStart
.Above all else I think we all want to try to make the NT and RT definition and processing model align everywhere that's possible. So if we want these (and/or more) changes in NT we should also have agreement they belong in RT as well.
Agreed that workerReady
seems to be what we're missing, and generally aligning the processing models with Fetch + Service Worker is what we want. This was discussed at the Service Worker WG briefly at https://docs.google.com/document/d/1ybS1q2HCPh3bNNOkjGpAPFug19A2BsIxYEi-i6lrB1w/edit#heading=h.k78cttk5esfw with the rough outcome that integrating the Timing Specs with Fetch is something that will need more work.
@mfalken:
I may be missing something, but the two specs seem to have workerStart after fetchStart. NT says workerStart is when the service worker was started up, or when the fetch event was dispatched. And it says fetchStart is the entry point to the Fetch spec ("immediately before a user agent starts the fetching process" is the clause that applies for service worker interception, I think). Worker startup and event dispatch happens in the course of the Fetch spec, so workerStart would be after fetchStart.
Ah, and I'm not as familiar with the Fetch spec steps, so I had originally read this differently (that when workerStart
is just "fetch event dispatched", that's the same as fetchStart
, and fetchStart
is more like step "D" here).
I think part of the discrepancy is the description of workerStart
in the NT spec differs from the processing model. Here's the description:
The workerStart attribute MUST return the time immediately before the user agent ran the worker (if the current document has an active service worker registration [SERVICE-WORKERS]) required to service the request, or if the worker was already available, the time immediately before the user agent fired an event named fetch at the active worker. Otherwise, if there is no active worker this attribute MUST return zero.
And I agree per your reasoning if [workerStartup=[ran the worker] or [fired fetch event]]
, both of which happen in Fetch spec, and fetchStart
is the entry point of Fetch spec, then workerStart
would be after fetchStart
.
However the processing model goes in a different "timestamp order":
- Immediately after the unload event is completed, record the current time as unloadEventEnd. If the navigation URL has an active worker registration, immediately before the user agent runs the worker record the time as workerStart, or if the worker is available, record the time before the event named fetch is fired at the active worker. Otherwise, if the navigation URL has no matching service worker registration, set workerStart value to zero.
- [fetch-start-step] If the new resource is to be fetched using a "GET" request method, immediately before a user agent checks with the relevant application caches, record the current time as fetchStart. Otherwise, immediately before a user agent starts the fetching process, record the current time as fetchStart.
From this processing model, in order, it seems workerStart
would always be before fetchStart
.
Stepping back, I think part of the confusion is workerStart
was gradually added to the NT/RT specs, then over time both specs were adapted more to be consistent with the Fetch spec. And maybe we're not referencing the exact correct parts of the Fetch spec?
So maybe what I'm arguing here is that fetchStart
shouldn't be the entry point of Fetch spec, but rather step "D"?
In practice today, Chrome seems to consistently set workerStart
to be before fetchStart
. (Firefox/Safari don't seem to implement workerStart
yet).
@mfalken:
This makes sense. I think the "in scope of a service worker" would be a worthwhile clarification... or rather it should something like "the first request that is in the same scope of the FINAL in-scope request that is same-origin to the final request in the redirect chain". Suppose there are two scopes: a.test/scope1 and a.test/scope2, and the redirect chain is a.test/scope1/page1 -> a.test/scope2/page2 -> a.test/scope2/page3. This would boot up a SW at scope1 and then another one at scope2. I think we want to capture the scope2 SW startup time. But this can be follow-up.
👍
@mfalken:
Agreed that workerReady seems to be what we're missing, and generally aligning the processing models with Fetch + Service Worker is what we want. This was discussed at the Service Worker WG briefly at https://docs.google.com/document/d/1ybS1q2HCPh3bNNOkjGpAPFug19A2BsIxYEi-i6lrB1w/edit#heading=h.k78cttk5esfw with the rough outcome that integrating the Timing Specs with Fetch is something that will need more work.
Awesome, let's work towards that!
Tried to summarize where we're at for the WebPerf WG https://docs.google.com/presentation/d/1r3FwT1UTo7lpjZvYe-YV7cNAee8co-qCxIU5SdERalQ/edit
Following the work I was doing on RT/Fetch integration, I want to make a concrete proposal for discussion here about how to handle redirects (beyond the diagram).
First of all, I think this should be in ResourceTiming and not in NavigationTiming, as workerStart is relevant for NT only because NT is an augmentation of RT (RT is more connected with fetching, NT more with document life-cycle).
The problem with workerStart
and redirects is not unique to workerStart
- the same problem exists for the other HTTP-related metrics in RT: domainLookupStart
, domainLookupEnd
,connectStart
, secureConnectionStart
, connectEnd
, requestStart
, responseStart
, nextHopProtocol
.
The problem is that in the case of redirects, any of these metrics could have several values, and due to a mixture of caching/workers/http, the "last" one might be ambiguous - for example, the last workerStart
might be before the last connectStart
if one of the workers was a redirect and the last request was an HTTP connection.
I propose doing the following:
- The following metrics:
domainLookupStart
,domainLookupEnd
,connectStart
,secureConnectionStart
,connectEnd
,requestStart
,responseStart
,nextHopProtocol
,workerStart
, of the ResourceTiming/NavigationTiming entry would be the ones relevant for fetching the final resource, ignoring redirects. They would be matching thefetchStart
metrics. -
redirectStart
,redirectEnd
,fetchStart
andresponseEnd
will stay as is. - Following that, consider including a "redirects" array in the RT entry, which is an array of RT entries with the redirect URL as the name of the entry and its own set of connection/worker metrics. This array would be empty if TAO fails.
- For worker-served responses,
workerStart
should be the time before the request was handed to the worker, andresponseStart
should be the time when the worker returned a non-null response tofetch
.domainLookupStart
,domainLookupEnd
,connectStart
,secureConnectionStart
,connectEnd
,requestStart
,responseStart
,nextHopProtocol
would be zero/empty.
We had a further discussion on this as well on March 18th 2021 in the WebPerfWG call, with ServiceWorker folks:
https://w3c.github.io/web-performance/meetings/2021/2021-03-18/index.html
I will address that and @noamr's feedback in this PR soon, and probably will need to just wait until https://github.com/w3c/navigation-timing/pull/141 goes in for simplicity.
@noamr:
Thanks for putting your suggestions together! Overall I agree on the simplification.
For worker-served responses, workerStart should be the time before the request was handed to the worker, and responseStart should be the time when the worker returned a non-null response to fetch. domainLookupStart, domainLookupEnd,connectStart, secureConnectionStart, connectEnd, requestStart, responseStart, nextHopProtocol would be zero/empty.
I think this would cause some "reduced insight" into resources vs. today, as you would lose details of DNS/TCP/req/res phases for all resources if a SW is active, right?
If the worker was just operating as a "pass through" for a resource, it seems like we should still get those breakdown in timings (assuming origin check passes).
@noamr:
Thanks for putting your suggestions together! Overall I agree on the simplification.
For worker-served responses, workerStart should be the time before the request was handed to the worker, and responseStart should be the time when the worker returned a non-null response to fetch. domainLookupStart, domainLookupEnd,connectStart, secureConnectionStart, connectEnd, requestStart, responseStart, nextHopProtocol would be zero/empty.
I think this would cause some "reduced insight" into resources vs. today, as you would lose details of DNS/TCP/req/res phases for all resources if a SW is active, right?
If the worker was just operating as a "pass through" for a resource, it seems like we should still get those breakdown in timings (assuming origin check passes).
Yes, though we should make it more clearer in FETCH. I created a new issue for that: https://github.com/whatwg/fetch/issues/1208
Where are we on this PR? What's the next step?
Where are we on this PR? What's the next step?
Based on the conversations we had at WG, I think this PR covers the issue.
- redirect timing et al are all part of a fetch rather than a response. So if a response is shared across fetches (e.g. a passthrough in a service worker, an in-flight sharing of responses, retrieving from cache) - the timing is separate - including the connection timing. The only thing that "sticks" with the response is the encoded/decoded body size.
I think this can be closed now. @nicjansma ?
@nicjansma - friendly ping :)