url icon indicating copy to clipboard operation
url copied to clipboard

URL.relative proposal

Open guybedford opened this issue 7 years ago • 28 comments

As Node.js has moved to embrace the URL spec over the legacy url API, there is still an API gap for handling relative URLs as discussed in https://github.com/nodejs/node/issues/616.

To follow on from that discussion, and discussion in the Node.js tooling group, it would be worth seeing if a proposal for a URL.relative might be considered here.

The API might look something like the following:

URL.relative(fromURL: URL | string, toURL: URL | string) -> string

where it returns the relative url with fragment string of toURL from fromURL, such that new URL(result, fromURL) would be the inverse operation.

Node.js certainly can go its own way here too, but we thought it would just be good to find out first if WhatWG would consider something along these lines.

guybedford avatar Dec 07 '18 14:12 guybedford

I think that creating relative URLs is not a common need and belongs in user land. Not just because I've already created one, too :wink:. While there are some complexities that might benefit from a standard, it could still be compared to rimraf not being part of the core Node.js platform.

stevenvachon avatar Dec 07 '18 15:12 stevenvachon

Do we know if this is used outside the domain of servers?

annevk avatar Dec 07 '18 18:12 annevk

With service workers, any URL operations done on a server are entirely likely to be needed in the browser.

ljharb avatar Dec 07 '18 19:12 ljharb

Copying my comment from https://github.com/whatwg/url/issues/136#issuecomment-423715051:

I'd like to push for some notion of a "relative URL" as a first-class object, at least in the spec terminology if not in the API. I'm working on defining a file format that includes a construct for referring to documents from other documents, naturally using URLs.

The thing is, I want to define some behavior around this construct that's independent of the base URL, but still uses URL notions like the path. The spec as it stands provides me no way of saying "this is a relative URL, and the path must contain at least one element that's an identifier" or whatever.

To put this more generally: tools often process documents at build time, in a context-free manner, where the base URL is not known. They may even process documents that may be used in multiple contexts with different base URLs. I believe this is why humans tend to gravitate towards referring to "relative URLs" as objects in and of themselves, and there's value in the spec matching this intuition in some way. Right now, it provides no way to even talk about a URL unless you have all the context of how the document that contains it will be used.

One possibility would be for the spec to include a definition of "relative URL" that explicitly includes an initial "ambiguous segment" and forbids certain operations if that segment is non-empty.

nex3 avatar Dec 07 '18 19:12 nex3

@ljharb well, service workers do not really have to deal with a file system and everything they can touch is keyed on URLs.

cc @jakearchibald to make sure I'm not wrong.

It would be really great to see some evidence this is indeed a somewhat widespread problem for which standardizing on a single library makes sense. E.g., https://www.npmjs.com/package/url-relative does not seem that popular, although it does suggest this might be a rather trivial addition complexity-wise.

annevk avatar Dec 08 '18 11:12 annevk

@annevk that library is a piece of crap. Try running most anything from relateurl's ~5000 test fixtures through it and it'll break.

stevenvachon avatar Dec 08 '18 15:12 stevenvachon

@stevenvachon please stay respectful, see https://whatwg.org/code-of-conduct.

I had missed that https://www.npmjs.com/package/relateurl is much more popular though. Looking at that and its dependencies though, it also seems a good bit more complex.

annevk avatar Dec 10 '18 09:12 annevk

Okay, the "url-relative" package was created as lazily as possible and only contributes to the growing number of useless drivel found within the NPM registry. It distracts from real progress by injecting confusion into the discovery phase. It was likely created solely out of pretentiousness.

Regarding relateurl's complexity; it's not done so by choice, but out of necessity. There're multiple definitions of a "relative" URL.

stevenvachon avatar Dec 10 '18 15:12 stevenvachon

And in case you were reviewing relateurl via its npm page, instead check out the its repository, as its currently unreleased version uses URL.

stevenvachon avatar Dec 10 '18 15:12 stevenvachon

There could be many return values that would satisfy the condition that the constructor be an inverse, and each has its use. For example, a relative URL from https://example.com/full/ to https://example.com/full/path/resource could return a string in the form //example.com/full/path/resource, and such relative URLs are commonly used when the scheme of the base is important but everything else must be specified. It could also return /full/path/resource if the full path is important, ../full/path/resource if the relative structure is important, or path/resource if only that part of the structure is important. Many applications will want the shortest version, but some applications will want to specify which form they want. This ambiguity would require a complicated API and specification, and I agree that if someone wants a particular form of this functionality they should write their own library.

achristensen07 avatar Dec 10 '18 18:12 achristensen07

or example, a relative URL from https://example.com/full/ to https://example.com/full/path/resource could return a string in the form //example.com/full/path/resource

I'd like to explicitly state that the use case of getting //example.com/full/path/resource from https://example.com/full/path/resource is out of the concern of the above proposal, and that if you are interested in this to separately propose a new URL('https://example.com/full/path/resource').schemePath or whatever it would make sense to call it.

The output of the relative URL function remains completely well-defined. The complexities of relateurl can also be done with URL manipulation before running such a function.

@annevk the common scenario is basically any generation or template creation of HTML / CSS / JS with references to other resources that need to be relative and not encode the base URL as the URL is independent. While in many cases the URL can be assumed to be the same as the page, if the template being generated needs to run on any other base URL, then URL.relative is needed.

guybedford avatar Dec 11 '18 13:12 guybedford

So what would the processing model look like, for the use cases that are in scope?

annevk avatar Dec 11 '18 14:12 annevk

I guess something like -

  1. If the scheme, host and authentication do not match between relURL and baseURL, then
    1. Return relURL.href
  2. Let relPath be the path of relURL.pathname relative to baseURL.pathname, including a leading "./" if relURL.pathname is contained in baseURL.pathname.
  3. Return relPath concatenated with relURL.search and relURL.hash.

guybedford avatar Dec 11 '18 14:12 guybedford

A relative URL of "./#hash" will cause a page refresh.

stevenvachon avatar Dec 11 '18 15:12 stevenvachon

A relative URL of "./#hash" will cause a page refresh.

One can always remove .hash / .query before passing the urls to URL.relative.

Jamesernator avatar Jul 22 '19 08:07 Jamesernator

One can always remove .hash / .query before passing the urls to URL.relative.

That means special-case behaviour for something unexpected.

stevenvachon avatar Jul 22 '19 12:07 stevenvachon

That means special-case behaviour for something unexpected.

It's URL.relative not path.relative, I don't think it's unexpected at all that URL.relative('https://foo.bar/#qux', 'https://foo.bar/boz') is ./#qux given that new URL('./#qux', 'https://foo.bar/boz') is https://foo.bar/#qux.

Jamesernator avatar Jul 23 '19 05:07 Jamesernator

It's pretty unexpected that you chose that particular relative representation, instead of #qux, or //foo.bar/#qux, or other possibilities. See https://github.com/whatwg/url/issues/421#issuecomment-445916341 for more on this topic.

domenic avatar Jul 23 '19 14:07 domenic

It's pretty unexpected that you chose that particular relative representation

This is the shortest form that works everywhere relative imports are supported, other forms like #qux won't work in import() for example.

Jamesernator avatar Jul 30 '19 03:07 Jamesernator

@Jamesernator import is not the only use case for relative URLs.

stevenvachon avatar Jul 30 '19 15:07 stevenvachon

import is not the only use case for relative URLs.

I know this but ./-prefixed urls work everywhere relative urls are accepted in the spec which is why this would make the most sense as it's the most portable.

While import is the only thing I'm aware of that requires ./ for relative urls, this is still a pretty big use case especially with html/css modules + import: urls soon, it'll be more important to be able to construct these urls easily.

Regarding // urls, I think there should probably be two APIs given there's two types of "relative url" defined in the spec: path relative and scheme relative.

Jamesernator avatar Jul 30 '19 15:07 Jamesernator

import doesn't even operate on URLs; it operates on module specifiers. Contorting the requirements of a hypothetical URL API for module specifiers seems like a non-starter.

domenic avatar Jul 30 '19 16:07 domenic

In my own experience working with module specifiers in the module system, the lack of a reliable URL.relative when outputting relative module URLs has been a constant pain point.

I guess relative representations form a sort of hierarchy:

  1. Hash-relative
  2. Path-relative
  3. Absolute path
  4. Absolute protocol path
  5. Full URL

Does that comprehensively cover the forms?

So far we've been suggesting (2) falling back to (5) as being the canonical approach. There are certainly arguments that (3) and (4) are useful too.

If this were C, I'd suggest flags for these forms, that could be composed as a second argument, which would then apply them in hierarchy and fail otherwise, but given that this is a JS API, I'm not sure how best to capture that.

Ideas welcome.

guybedford avatar Jul 31 '19 00:07 guybedford

Search-relative.

stevenvachon avatar Jul 31 '19 02:07 stevenvachon

it operates on module specifiers. Contorting the requirements of a hypothetical URL API for module specifiers seems like a non-starter.

Yes but all "module specifiers" that start with ./ / / or ../ are treated as urls with a base of whatever the url of the current module is.

Jamesernator avatar Jul 31 '19 03:07 Jamesernator

I was looking for similar features and opened the following issue with nodejs

https://github.com/nodejs/node/issues/31874

The spec @ https://url.spec.whatwg.org/#relative-url-with-fragment-string also talks about certain segments of the URL but whatwg/url does not give read access to those fragments.

While certain naive user land libraries might be able to work with a subset of valid URLs, actually writing code that that works for all URLs is more involved and I think a good place for that is in whatwg/url

https://github.com/nodejs/node/issues/31874 is slightly different then #421, should I create a new issue in whatwg/url for this?

dodtsair avatar Feb 19 '20 19:02 dodtsair

@dodtsair an issue per feature is best, yes. Evidence that it's needed in terms of library usage / StackOverflow questions would help a lot in evaluating.

annevk avatar Feb 20 '20 08:02 annevk

@guybedford Perhaps its best to move this proposal from WHATWG (browser focused) to WinterCG (server focused).

Its becoming relevant again now that Node.js has deprecated url.parse() which works with relative urls and there is no good replacement for it.

styfle avatar Jan 04 '23 14:01 styfle