lighthouse
lighthouse copied to clipboard
Distinguish Navigation URLs and Frame URLs
Terminology
This doesn't have to stick, I just need it to make writing the issue easier.
- Navigated (Navigation?, Document?) URL: The the last URL the browser performed a hard navigation to.
- Lighthouse resolves this URL tracking CDP
Page.frameNavigatedevents. - Can only be determined in Lighthouse if there was a navigation
- Lighthouse resolves this URL tracking CDP
- Frame URL: The URL that appears in the search bar. Can be changed with
history.pushStateor anchor links without making an additional network request or performing a hard navigation.- Lighthouse legacy navigation runner does not use this URL anywhere.
- URL can queried using
Page.getFrameTree. - Lighthouse FR runners can resolve this URL using the Puppeteer
page.url()function (wrapped bydriver.url()).
Problem
For navigations, gatherers need to know the navigated URL in order to find the main document, and there can be issues if the frame URL is provided instead. #13699 will ensure consistent use of the navigation URL for navigation mode. However, timespan and snapshot mode cannot resolve the navigated URL without Page.frameNavigated events, so they must use the frame url with page.url() instead.
In the LHR requestedUrl/finalUrl, we use the nav URL which can be confusing to the end user who probably expects the frame URL https://github.com/GoogleChrome/lighthouse/issues/13697. Again, this does not apply timespan/snapshot which have to use the frame URL everywhere.
Once #13699 is merged the following shows when each "type" of URL will be returned from different sources:
Gather context.url |
artifacts.URL |
lhr.requestedUrl / lhr.finalUrl |
driver.url() / page.url() |
|
|---|---|---|---|---|
| Legacy | Navigation URL | Navigation URL | Navigation URL | N/A |
| Navigation | Navigation URL | Navigation URL | Navigation URL | Frame URL |
| Timespan | Frame URL | Frame URL | Frame URL | Frame URL |
| Snapshot | Frame URL | Frame URL | Frame URL | Frame URL |
Solution
To ensure the "type" of URL is consistant in all three modes, I propose the following setup:
Gather context.url |
artifacts.URL.* |
lhr.requestedUrl |
lhr.finalUrl |
driver.url() / page.url() |
|
|---|---|---|---|---|---|
| Legacy | Deprecated | See below | Navigation URL | Frame URL | N/A |
| Navigation | Deprecated | See below | Navigation URL | Frame URL | Frame URL |
| Timespan | Deprecated | See below | N/A | Frame URL | Frame URL |
| Snapshot | Deprecated | See below | N/A | Frame URL | Frame URL |
New URL artifact:
interface URL {
/** URL of the main frame before Lighthouse starts */
initialUrl: string;
/** URL of the first document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
requestedUrl?: string;
/** URL of the last document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
mainDocumentUrl?: string;
/** URL of the main frame after Lighthouse finishes */
finalUrl: string;
}
Some notes on the above proposal:
- Gather
context.urlis deprecate because we can get the Nav URL fromartifacts.URLand the frame URL fromdriver.url() lhr.requestedUrlwill be an optional property that only appears on navigation LHRs.- Add new
initialUrlto be a staple of everyartifacts.URL- Frame URL
- Would be
about:blankon most navigations.
Implementation Plan
- [x] Add
initialUrlandmainDocumentUrlto theartifacts.URL. - [x] Switch audit/computed artifact usages of
artifacts.URL.finalUrltoartifacts.URL.mainDocumentUrl - [x] [Possibly breaking?] Deprecate
context.url - [x] [Breaking] Make
requestedUrlundefined in timespan/snapshot onartifacts.URLand the LHR - [ ] [Breaking] Add
finalDisplayedUrland deprecatefinalUrl - [ ] Add
mainDocumentUrlto the LHR - [ ] Remove
initialUrland hold until we actually need it
Related
https://github.com/GoogleChrome/lighthouse/issues/8984
Interesting development, I think we can use Page.getNavigationHistory to resolve the mainDocumentUrl in timespan and snapshot mode. This would mainDocumentUrl doesn't ever need to be undefined. I'll do some more investigation to see if this will always give the correct value.
Edit: Perhaps not, I was hoping the userTypedUrl field would give us the document URL, but it isn't the document url if there was a 300 redirect.
From our discussions in https://github.com/GoogleChrome/lighthouse/pull/13819, it seems like we have 3 bikeshedding decisions to make:
- Do we keep
finalUrlon the LHR as the final navigation URL? This will be purely for legacy support, our own clients should never use this. - What do we call the final navigation URL? This issue originally proposed
mainDocumentUrl<<<<<<<<, other ideas proposed:mainResourceUrlfinalNavigationUrlfinalNavigatedUrlfinalNetworkUrlhtmlDocumentUrl
- What do we call the final frame URL? This issue originally proposed using
finalUrl, I don't think anyone wants this anymore. Other ideas proposed:finalFrameUrldisplayedUrlfinalDisplayUrlfinalDisplayedUrl<<<<<<<<<<<finalPageUrlendUrllastUrlendingUrl
I'm not including finalDocumentUrl because @connorjclark proposed it for the final frame URL and @brendankenny proposed it for the final document URL (lol). Clearly that name is confusing.
My preferences:
1: Keep finalUrl on the LHR only, for legacy support
2: mainDocumentUrl because that's what we have right now
3: finalPageUrl, maybe swap initialUrl -> initialPageUrl
I'm not including
finalDocumentUrlbecause @connorjclark proposed it for the final frame URL and @brendankenny proposed it for the final document URL (lol). Clearly that name is confusing.
brendan also pitched "navigated"/"navigation" for the finalUrl/finalNetworkUrl concept. IMO this doesnt work since the navigation API changes the URL but not this one.