lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Distinguish Navigation URLs and Frame URLs

Open adamraine opened this issue 3 years ago • 3 comments

Terminology

This doesn't have to stick, I just need it to make writing the issue easier.

  • Navigated (Navigation?, Document?) URL: The the last URL the browser performed a hard navigation to.
    • Lighthouse resolves this URL tracking CDP Page.frameNavigated events.
    • Can only be determined in Lighthouse if there was a navigation
  • Frame URL: The URL that appears in the search bar. Can be changed with history.pushState or anchor links without making an additional network request or performing a hard navigation.
    • Lighthouse legacy navigation runner does not use this URL anywhere.
    • URL can queried using Page.getFrameTree.
    • Lighthouse FR runners can resolve this URL using the Puppeteer page.url() function (wrapped by driver.url()).

Problem

For navigations, gatherers need to know the navigated URL in order to find the main document, and there can be issues if the frame URL is provided instead. #13699 will ensure consistent use of the navigation URL for navigation mode. However, timespan and snapshot mode cannot resolve the navigated URL without Page.frameNavigated events, so they must use the frame url with page.url() instead.

In the LHR requestedUrl/finalUrl, we use the nav URL which can be confusing to the end user who probably expects the frame URL https://github.com/GoogleChrome/lighthouse/issues/13697. Again, this does not apply timespan/snapshot which have to use the frame URL everywhere.

Once #13699 is merged the following shows when each "type" of URL will be returned from different sources:

Gather context.url artifacts.URL lhr.requestedUrl / lhr.finalUrl driver.url() / page.url()
Legacy Navigation URL Navigation URL Navigation URL N/A
Navigation Navigation URL Navigation URL Navigation URL Frame URL
Timespan Frame URL Frame URL Frame URL Frame URL
Snapshot Frame URL Frame URL Frame URL Frame URL

Solution

To ensure the "type" of URL is consistant in all three modes, I propose the following setup:

Gather context.url artifacts.URL.* lhr.requestedUrl lhr.finalUrl driver.url() / page.url()
Legacy Deprecated See below Navigation URL Frame URL N/A
Navigation Deprecated See below Navigation URL Frame URL Frame URL
Timespan Deprecated See below N/A Frame URL Frame URL
Snapshot Deprecated See below N/A Frame URL Frame URL

New URL artifact:

interface URL {
	/** URL of the main frame before Lighthouse starts */
	initialUrl: string;
	/** URL of the first document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
	requestedUrl?: string;
	/** URL of the last document request during a Lighthouse navigation. `undefined` in timespan/snapshot modes. */
	mainDocumentUrl?: string;
	/** URL of the main frame after Lighthouse finishes */
	finalUrl: string;
}

Some notes on the above proposal:

  • Gather context.url is deprecate because we can get the Nav URL from artifacts.URL and the frame URL from driver.url()
  • lhr.requestedUrl will be an optional property that only appears on navigation LHRs.
  • Add new initialUrl to be a staple of every artifacts.URL
    • Frame URL
    • Would be about:blank on most navigations.

Implementation Plan

  • [x] Add initialUrl and mainDocumentUrl to the artifacts.URL.
  • [x] Switch audit/computed artifact usages of artifacts.URL.finalUrl to artifacts.URL.mainDocumentUrl
  • [x] [Possibly breaking?] Deprecate context.url
  • [x] [Breaking] Make requestedUrl undefined in timespan/snapshot on artifacts.URL and the LHR
  • [ ] [Breaking] Add finalDisplayedUrl and deprecate finalUrl
  • [ ] Add mainDocumentUrl to the LHR
  • [ ] Remove initialUrl and hold until we actually need it

Related

https://github.com/GoogleChrome/lighthouse/issues/8984

adamraine avatar Feb 25 '22 17:02 adamraine

Interesting development, I think we can use Page.getNavigationHistory to resolve the mainDocumentUrl in timespan and snapshot mode. This would mainDocumentUrl doesn't ever need to be undefined. I'll do some more investigation to see if this will always give the correct value.

Edit: Perhaps not, I was hoping the userTypedUrl field would give us the document URL, but it isn't the document url if there was a 300 redirect.

adamraine avatar Mar 16 '22 22:03 adamraine

From our discussions in https://github.com/GoogleChrome/lighthouse/pull/13819, it seems like we have 3 bikeshedding decisions to make:

  1. Do we keep finalUrl on the LHR as the final navigation URL? This will be purely for legacy support, our own clients should never use this.
  2. What do we call the final navigation URL? This issue originally proposed mainDocumentUrl <<<<<<<<, other ideas proposed:
    • mainResourceUrl
    • finalNavigationUrl
    • finalNavigatedUrl
    • finalNetworkUrl
    • htmlDocumentUrl
  3. What do we call the final frame URL? This issue originally proposed using finalUrl, I don't think anyone wants this anymore. Other ideas proposed:
    • finalFrameUrl
    • displayedUrl
    • finalDisplayUrl
    • finalDisplayedUrl <<<<<<<<<<<
    • finalPageUrl
    • endUrl
    • lastUrl
    • endingUrl

I'm not including finalDocumentUrl because @connorjclark proposed it for the final frame URL and @brendankenny proposed it for the final document URL (lol). Clearly that name is confusing.

My preferences: 1: Keep finalUrl on the LHR only, for legacy support 2: mainDocumentUrl because that's what we have right now 3: finalPageUrl, maybe swap initialUrl -> initialPageUrl

adamraine avatar Jun 14 '22 21:06 adamraine

I'm not including finalDocumentUrl because @connorjclark proposed it for the final frame URL and @brendankenny proposed it for the final document URL (lol). Clearly that name is confusing.

brendan also pitched "navigated"/"navigation" for the finalUrl/finalNetworkUrl concept. IMO this doesnt work since the navigation API changes the URL but not this one.

paulirish avatar Jul 07 '22 23:07 paulirish