Expanding the concept of internal resource links
What is the issue with the HTML Standard?
This is a follow up from #11019.
When integrating <link rel=expect>, we've introduced the concept of "internal resource links". These links use URL fragments, but are not expected to fetch anything or work across documents.
With #11019, it seems like this concept can be very useful for other things as well - importing styles and scripts from an ID of an element within the document.
However, this concept has a few gotchas:
- It falls apart when there is a
<base>element, or if the base URL changes for other reasons. - In non-supporting browsers, this can lead to loading the entire document and try to treat it as a style.
For <link rel=expect>, this is not too bad as a faulty expect link would not break the site.
I see 3 options of how to address this:
- Use fragment URLs, and live with these gotchas. They're not really that common, and can be documented. Loading the document from cache and then ignoring it because it's not a stylesheet would be a bit irritating but nothing more. Note that this is also consistent with how SVG references work for clip-path and filters in CSS.
- Add a constraint outside the URL. For example, use a special
typeattribute or new attribute (inline?) inlinkthat forces only internal resources and ignores<base>. - Add a new URL scheme for this, e.g.
inline:element-id. This has benefits such as not having any backwards compatibility issues, and also we can give it additional semantics such as checking within a shadow tree first if applicable. However, it feels counter-intuitive to add a new URL scheme for something that's supposedly supported in the platform already. Another benefit of using this is that we can give it immutable semantics, which would be inconsistent with how SVG references work. - Treat CSS and JS separately, perhaps by extending importmaps for scripts and using
@sheetsomehow in CSS.
I am currently leaning towards (2), which allows opting out of the weirdness of <base> and friends without introducing a new unfamiliar thing.
But it would be good to align on this conceptually before we dive into the particular implementations like #11019.
/cc @annevk @zcorpan @emilio @domenic @KurtCattiSchmidt @smaug----
Good outline of the problem and potential solutions.
I find (1) pretty reasonable. I think the <base> gotcha is rare, and I think non-supporting browsers during the transition period is a short-term problem that it's not worth complexifying the platform for. (And, during the transition period, there should be some JavaScript feature detection possible, right?)
One thing I didn't realize until reading the minutes from #11278 was that (1) might be proposing two separate things:
- 1a: use fragment URLs
- 1b: modify the semantics of fragment URLs, for these cases, to match SVG and CSS, instead of HTML. Where they ignore the base URL or document URL, and instead reach directly for elements. (For CSS that is specced here.)
I am not sure how comfortable I am with 1b, or whether it's necessary.
I guess for CSS it's done because it's useless to refer to foo.css#id within the external stylesheet (which would be the behavior if you applied the normal URL resolution algorithm). That is, in CSS, resolving URLs normally would always be useless in external stylesheets.
Whereas in HTML, resolving URLs normally works fine, except if you use <base>.
So I think we can avoid 1b, and continue using the current spec text for matching, if we go with just 1a.
One thing I didn't realize until reading the minutes from #11278 was that (1) might be proposing two separate things:
- 1a: use fragment URLs
- 1b: modify the semantics of fragment URLs, for these cases, to match SVG and CSS, instead of HTML. Where they ignore the base URL or document URL, and instead reach directly for elements. (For CSS that is specced here.)
This was not proposed in the OP, but rather an idea that was explored during the call.
I am not sure how comfortable I am with 1b, or whether it's necessary.
I guess for CSS it's done because it's useless to refer to
foo.css#idwithin the external stylesheet (which would be the behavior if you applied the normal URL resolution algorithm). That is, in CSS, resolving URLs normally would always be useless in external stylesheets.
That's not the case. CSS allows referring to inline SVG element from clip-path and filter.
Whereas in HTML, resolving URLs normally works fine, except if you use
<base>.So I think we can avoid 1b, and continue using the current spec text for matching, if we go with just 1a.
I still think that <base> is a useless footgun in these cases, but I agree that we can explore 1b separately.
One idea for 1b is to add an attribute, like <base external> that makes <base> work only for external links like <a href=...> and not to fragment-only links. But as I said, I agree that we should start with 1a.
The 1a current spec text processes an internal reference link given the element's node document, yes?
To reach out of a shadow, some sort of fit-to-purpose tree-scoped reference resolution spec is still needed.
The explainer proposes that fragment references always refer to the light DOM.
Some comments on the main issue suggest that use cases like streaming SSR need to also reach across and into shadows.
foo.css#id wouldn't be needed with a general purpose attribute like condition="sheet(foo bar)" or condition="layer(something)"(#7540).
Or with a feature-specific attribute (#11022).
Since this feature is intended to support https://github.com/whatwg/html/issues/11019, it should specifically address the concern that these URL fragments really should be targetable between different DOM scopes, see https://github.com/whatwg/html/issues/11019#issuecomment-2802714657 and https://github.com/whatwg/html/issues/11019#issuecomment-2803414242
Since this feature is intended to support #11019, it should specifically address the concern that these URL fragments really should be targetable between different DOM scopes, see #11019 (comment) and #11019 (comment)
I suggest we leave the shadow-DOM discussion in #11019 for now, as it's style-specific. I specifically wanted to address the higher-level issue here and we can keep talking about style/shadow-DOM in #11019.
I don't really understand what "live with these gotchas" means or why <base> alone is problematic here. What about pushState()? What is the proposed processing model?
I don't really understand what "live with these gotchas" means or why
<base>alone is problematic here. What aboutpushState()? What is the proposed processing model?
The pushState problem is actually a parsing time issue. It is caused when parsing the relative (#fragment) URL to an absolute URL (/page-a#fragment) is done at point A, and used at point B (document URL is page-b, link URL is previously parsed to be page-a#fragment), at which point it might no longer be an internal resource link.
This is an issue today in the processing model of <link rel=expect>, and can be treated as a bug.
The suggested processing model is that the absolute URL for <link rel=expect> and for any future internal resource link would be resolved at the last minute before "fetching" the resource. This would resolve the issue since those link would stay internal, resilient to the document's URL changing due to pushState.
Apart from this timing issue and <base>, is there any other known gotcha with internal resource link?
I don't see <base> as a big issue, and we can perhaps mitigate it by adding an attribute to <base>, e.g. <base external>, that would un-apply it to internal resource link.
I think that is rather confusing behavior.
It also means that you have to use #blah in order for these links to work reliably. You cannot use /document#blah as that would become problematic the moment someone deploys pushState(), right? (You still haven't detailed the processing model, though it sounds like you want to parse and then immediately compare against document's URL.)
(Note that <base> has a similar issue, in that it's very much dynamic as well.)
This timing issue also shows up if you attempt to serialize any of this state (e.g., invoke the <link>.href getter), and do something with the result at a later point.
Now a lot of this applies to <a href> URLs with fragments as well, but at least those can be used for local and external references. It's quite a bit trickier to end up in a completely broken state.
I think that is rather confusing behavior.
It is somewhat consistent with existing behavior of css SVG references (clip path, filter).
It also means that you have to use
#blahin order for these links to work reliably. You cannot use/document#blahas that would become problematic the moment someone deployspushState(), right? (You still haven't detailed the processing model, though it sounds like you want to parse and then immediately compare against document's URL.)
That's the case today with relative URLs (./something) and pushState. I don't see it as confusing.
(Note that
<base>has a similar issue, in that it's very much dynamic as well.)This timing issue also shows up if you attempt to serialize any of this state (e.g., invoke the
<link>.hrefgetter), and do something with the result at a later point.
Correct
Now a lot of this applies to
<a href>URLs with fragments as well, but at least those can be used for local and external references. It's quite a bit trickier to end up in a completely broken state.
The difference is the fragment URLs in a href have legitimate uses, and none in "src" attributes like script or importing a style.
Anyway, I think this is more straightforward but the caveats you describe make me think that the dedicated URL scheme option can go back into consideration, with the added value of being able to give it some new semantics around shadow DOM
CSS has very specific processing for URL input that starts with # as per https://drafts.csswg.org/css-values/#local-urls. Though the extent to which that is properly implemented and consistently applied across CSS is unclear. This would not be consistent with that as I understand it. (E.g., that would not allow /document#blah to be considered a local URL.)
CSS has very specific processing for URL input that starts with
#as per https://drafts.csswg.org/css-values/#local-urls. Though the extent to which that is properly implemented and consistently applied across CSS is unclear. This would not be consistent with that as I understand it. (E.g., that would not allow/document#blahto be considered a local URL.)
Perhaps we could make something consistent with that? As in, in cases of importing scripts/styles etc, if a URL starts with #, treat is as something similar to a CSS local URL (and define that beast a bit better)
Thanks for putting this together @noamr! I agree that option 1 is the least bad.
CSS has very specific processing for URL input that starts with
#as per https://drafts.csswg.org/css-values/#local-urls. Though the extent to which that is properly implemented and consistently applied across CSS is unclear. This would not be consistent with that as I understand it. (E.g., that would not allow/document#blahto be considered a local URL.)
This is a great point. I also want to discuss how ID scoping should work for fragment identifiers. The most obvious is to follow HTML5 scoping rules (as <link rel=expect> currently does), but I would argue that is somewhat broken today for both local anchors and <link rel=expect>.
HTML currently scopes local anchor references and <link rel=expect> to only look at the global document scope. This means that local anchor references don't work at all for identifiers in shadow roots (even when linked from a shadow root). The same applies for <link rel=expect>, in that you cannot block rendering based on an ID within a shadow root (correct me if I'm wrong here @noamr). This seems broken to me.
CSS's concept of a "tree-scoped reference" feels more intuitive for these cases, in that parent scopes are searched as well. "(In other words, tree-scoped names "inherit" into descendant shadow trees, so long as they don’t define the same name themselves.)"
But another option is to allow the ID references to target anything within the flat tree (i.e. global scoping). This was specifically desired for this feature by @justinfagnani here https://github.com/whatwg/html/issues/11019#issuecomment-2802714657
So to summarize:
- HTML scoping (ID's within shadow roots are not accessible)
- CSS tree-scoped reference semantics (parent shadow root ID's are accessible)
- Flat tree scoping (global scope)
Any thoughts on this?
But another option is to allow the ID references to target anything within the flat tree (i.e. global scoping). This was specifically desired for this feature by @justinfagnani here #11019 (comment)
I do strongly believe we need a way to reference declarative style sheets with a global identifier, but I don't think we should do that by changing any scoping rules for existing things like IDs.
I think we need either need a new global namespace (what I was referring to with the xid placeholder attribute name in other issues), or to use an existing global namespace like module specifiers.
Note that the processing model for fragment URLs and internal resource links is well-defined already, it's the 1a model:
- Navigation: https://html.spec.whatwg.org/#beginning-navigation "url equals navigable's active session history entry's URL with exclude fragments set to true".
<link rel=expect>: https://html.spec.whatwg.org/#process-internal-resource-link "url does not equal doc's URL with exclude fragments set to false" (it's negated for early return purposes)
So from my perspective 1a (HTML-like fragment URL resolution) continues to be fine, and we don't need to move to 1b (CSS-like fragment URL resolution).
To summarize:
#foo
| Spec strategy | Base case | After pushState() |
After <base> insertion |
|---|---|---|---|
| 1a | Works | Works | Works |
| 1b | Works | Works | Works |
For 1a, pushState() and <base> change how both the document's URL and the URL #foo are parsed into absolute URLs, so the comparison still works before and after.
/document#foo
| Spec strategy | Base case | After pushState() |
After <base> insertion |
|---|---|---|---|
| 1a | Works | Doesn't work | Doesn't work |
| 1b | Doesn't work | Doesn't work | Doesn't work |
The
pushStateproblem is actually a parsing time issue.
FWIW, I agree with this, and I think it could be fixed relatively easily and I would be surprised if that change wasn't web compatible.