URI.js icon indicating copy to clipboard operation
URI.js copied to clipboard

Apostrophe not parsed correctly

Open kode54 opened this issue 6 years ago • 8 comments

Another project using this parser produces URIs that include apostrophes, even when they're not desirable:

https://www.google.com's

kode54 avatar Jul 11 '17 02:07 kode54

Thank you for raising this issue. I'm afraid this is not a problem with URI.js, but rather a false expectation of how URLs work. Because 's is a valid path (explanation below) it should be part of the URL returned by URI.withinString().

The character 0x27 (', single quote) is considered a sub-delim (RFC3986 Section 2.2 Reserved Characters). It is a valid character in the path segment (RFC3986 Section 3.3 Path) by following the ABNF:

  • path > path-noscheme > segment-nz-nc > 1*( unreserved / pct-encoded / sub-delims / "@" ) > '
  • path > path-rootless > segment-nz > 1*pchar > sub-delims > '

The URL Spec seems to agree with this. Try running new URL("https://www.google.com's") in your browser's DevTools.

rodneyrehm avatar Jul 11 '17 07:07 rodneyrehm

@rodneyrehm I think it's fairly safe to assume that paths should begin with a slash (after the domain). I tested this in Chrome, and while it certainly parses it in URL as part of the origin, the omnibox always executes a search for it instead of trying to navigate.

xPaw avatar Jul 11 '17 09:07 xPaw

While I agree, I'd rather not force that assumption on everyone. But you could ignore trailing 's while scanning for URLs by changing URI.findUri.trim accordingly:

var text = "having fun with http://example.org's URL";
var replace = function(url) {
  return '<a>' + url + '</a>';
}

URI.withinString(text, replace)
// "having fun with <a>http://example.org's</a> URL"

URI.findUri.trim = /[`!()\[\]{};:'".,<>?«»“”„‘’]+$|'s$/;

URI.withinString(text, replace)
// "having fun with <a>http://example.org</a>'s URL"

rodneyrehm avatar Jul 11 '17 10:07 rodneyrehm

That would only fix the exact case of 's, I don't think that's good enough. Perhaps there's a way to specify that a slash is required for path to be parsed?

I also noticed that balanced quotes do not work if there isn't a space after the second quite, like so: "http://google.com"s

xPaw avatar Jul 11 '17 10:07 xPaw

That would only fix the exact case of 's, I don't think that's good enough.

what else do you expect to be considered here?

I also noticed that balanced quotes do not work if there isn't a space after the second quite, like so: "http://google.com"s

that's because the URL now technically ends with s, so the second " is considered to be a part of the URL itself. Case of garbage-in-garbage-out?

rodneyrehm avatar Jul 11 '17 10:07 rodneyrehm

Case of garbage-in-garbage-out?

Pretty much, sadly we have to deal with garbage when parsing user input in a chat client.

Perhaps there's a way to specify that a slash is required for path to be parsed?

Is there a way to accomplish this?

xPaw avatar Jul 11 '17 10:07 xPaw

Pretty much, sadly we have to deal with garbage when parsing user input in a chat client.

however, there are limits to what you can and should do.

Perhaps there's a way to specify that a slash is required for path to be parsed?

withinString() does not actually parse the URL. It only searches for strings that look like an URL. You could deal with this in the callback function:

var text = ' http://google.com/\'s http://google.com\'s "http://google.com"s ';
URI.withinString(text, function(url) {
  // try to handle garbage input here
  console.log(url);

  return '<a>' + url + '</a>';
})

rodneyrehm avatar Jul 11 '17 11:07 rodneyrehm

Another way to look at it is: are apostrophes allowed in TLDs? If not then could it be possible to not treat 's as part of the URL when it's located within the TLD? Sure, it won't solve this for other URLs like http://10.0.0.1's, but it would solve the issue in 99% of daily use cases of URI.withinString.

EDIT: Well, http://example.com/my-path's would also be a fail, but oh well...

astorije avatar Jul 12 '17 05:07 astorije