URI.js
URI.js copied to clipboard
Apostrophe not parsed correctly
Another project using this parser produces URIs that include apostrophes, even when they're not desirable:
https://www.google.com's
Thank you for raising this issue. I'm afraid this is not a problem with URI.js, but rather a false expectation of how URLs work. Because 's
is a valid path (explanation below) it should be part of the URL returned by URI.withinString().
The character 0x27 ('
, single quote) is considered a sub-delim (RFC3986 Section 2.2 Reserved Characters). It is a valid character in the path segment (RFC3986 Section 3.3 Path) by following the ABNF:
- path > path-noscheme > segment-nz-nc > 1*( unreserved / pct-encoded / sub-delims / "@" ) >
'
- path > path-rootless > segment-nz > 1*pchar > sub-delims >
'
The URL Spec seems to agree with this. Try running new URL("https://www.google.com's")
in your browser's DevTools.
@rodneyrehm I think it's fairly safe to assume that paths should begin with a slash (after the domain). I tested this in Chrome, and while it certainly parses it in URL
as part of the origin, the omnibox always executes a search for it instead of trying to navigate.
While I agree, I'd rather not force that assumption on everyone. But you could ignore trailing 's
while scanning for URLs by changing URI.findUri.trim
accordingly:
var text = "having fun with http://example.org's URL";
var replace = function(url) {
return '<a>' + url + '</a>';
}
URI.withinString(text, replace)
// "having fun with <a>http://example.org's</a> URL"
URI.findUri.trim = /[`!()\[\]{};:'".,<>?«»“”„‘’]+$|'s$/;
URI.withinString(text, replace)
// "having fun with <a>http://example.org</a>'s URL"
That would only fix the exact case of 's
, I don't think that's good enough. Perhaps there's a way to specify that a slash is required for path to be parsed?
I also noticed that balanced quotes do not work if there isn't a space after the second quite, like so: "http://google.com"s
That would only fix the exact case of
's
, I don't think that's good enough.
what else do you expect to be considered here?
I also noticed that balanced quotes do not work if there isn't a space after the second quite, like so:
"http://google.com"s
that's because the URL now technically ends with s
, so the second "
is considered to be a part of the URL itself. Case of garbage-in-garbage-out?
Case of garbage-in-garbage-out?
Pretty much, sadly we have to deal with garbage when parsing user input in a chat client.
Perhaps there's a way to specify that a slash is required for path to be parsed?
Is there a way to accomplish this?
Pretty much, sadly we have to deal with garbage when parsing user input in a chat client.
however, there are limits to what you can and should do.
Perhaps there's a way to specify that a slash is required for path to be parsed?
withinString()
does not actually parse the URL. It only searches for strings that look like an URL. You could deal with this in the callback function:
var text = ' http://google.com/\'s http://google.com\'s "http://google.com"s ';
URI.withinString(text, function(url) {
// try to handle garbage input here
console.log(url);
return '<a>' + url + '</a>';
})
Another way to look at it is: are apostrophes allowed in TLDs? If not then could it be possible to not treat 's
as part of the URL when it's located within the TLD?
Sure, it won't solve this for other URLs like http://10.0.0.1's
, but it would solve the issue in 99% of daily use cases of URI.withinString
.
EDIT: Well, http://example.com/my-path's
would also be a fail, but oh well...