url icon indicating copy to clipboard operation
url copied to clipboard

Differentiate from zero-sized fragment and no fragment in url

Open lu-zero opened this issue 2 years ago • 7 comments

scheme://host:port/#

and

scheme://host:port/

if fed to the URL do not distinguish between the two: URL.hash returns ''

and to make it even stranger passing .hash = '#' produces scheme://host:port/# but calling .hash returns '' nonetheless.

would be nicer if .hash returns undefined/null if it is unset or "#" if the trailing hash is present.

lu-zero avatar Jul 13 '23 17:07 lu-zero

We cannot change the existing API, but I'm somewhat supportive of adding API surface for this as it is indeed hidden information. For search too. (hasSearch & hasHash seem more palatable.)

Having said that, is there evidence on Stack Overflow or in popular JS libraries that this is a shortcoming people have to work around?

annevk avatar Jul 13 '23 21:07 annevk

I found the problem while looking at how the url fragment is supported across languages while working at another standard, so I cannot tell you how widespread this need is within JS, I guess we'll have to make a note and signal the pitfall.

What is surprising me even more is that you do not get what you set.

let url = new URL("scheme://host/path/");
console.log(url.hash);
url.hash = "#";
console.log(url.toString()); // -> scheme://host/path/#
console.log(url.hash); // -> ''
url.hash = "#a";
console.log(url.toString()); // -> scheme://host/path/#a
console.log(url.hash); // -> '#a'

lu-zero avatar Jul 14 '23 07:07 lu-zero

I agree that this part of the JS URL API is awkward. To give another data point: in my library WebURL, which implements the WHATWG standard in Swift, I made this change ("not present" is communicated as nil, not as an empty string) and some other tweaks.

WebURL uses nil to signal that a value is not present, rather than an empty string. This is a more accurate description of components which keep their delimiter even when empty. For example, consider the following URLs:

http://example.com/ http://example.com/?

According to the URL Standard, these URLs are different; however, JavaScript’s search property returns an empty string for both. In fact, these URLs return identical values for every component in JS, and yet still the overall URLs compare as not equal to each other. This has some subtle secondary effects, such as url.search = url.search potentially changing the URL.

WebURL avoids this by saying that the first URL has a nil query (to mean “not present”), and the latter has an empty query. This has the nice property that every unique URL has a unique combination of URL components.

I appreciate that the JS API cannot be changed at this point, though.

karwa avatar Jul 14 '23 10:07 karwa

Host has this problem too.

  • You cannot distinguish sc:///foo from sc:/foo, nor can you distinguish sc: from sc:// by inspecting the properties of their corresponding URL objects (other than the href itself).

There is this classic post according to which query and fragment have been in use fairly consistently to refer to the search without the ? sigil and the hash without the # sigil.

So one option is to fix search and hash and make them available as query and fragment instead. The search and hash getters / setters can then be marked as legacy or deprecated (but not removed).

alwinb avatar Jan 07 '24 19:01 alwinb

Having said that, is there evidence on Stack Overflow or in popular JS libraries that this is a shortcoming people have to work around?

I've run into this problem myself, in multiple projects and libraries, in both Node & browsers.

Right now I'm building developer tools, where URLs are taken as string input, parsed, and manipulated by component, and preserving the raw formatting where possible is useful. Not being able to differentiate between /? and / and the end of a URL is quite inconvenient! I'm still using Node's url.parse in some places in part because it does not have this behaviour and that's important.

Of course this state does exist within the URL parser (the URL's internal query and fragment states in the spec do store empty & null differently) but it's just not currently exposed the same way in search & hash (in both cases, both null and empty are exposed as '').

Totally understand that changing the existing API is impractical. Either of the options proposed here so far would work well in scenarios like mine:

  • hasSearch and hasHash booleans to distinguish no-delimiter vs delimiter-but-empty-value (or has{Search,Hash}Delimiter, if we want to be even more explicit)
  • query & fragment fields that do always include the delimiter as it was originally parsed, so they're set even if the value itself is empty

The latter is definitely more convenient as a user (fullPath = url.pathname + url.query + url.fragment would effectively reproduce the original relative url components - which it does not do today!) but both are workable, and the confusion of two very similar fields with almost always identical values might not be worthwhile.

pimterry avatar Aug 21 '24 12:08 pimterry