invidious
invidious copied to clipboard
Exclude non-url characters in text auto links
I sometimes see that in cases like (visit my website at https://example.com) the closing parenthesis gets included in the link. Markdown autolinks don't have this issue, and I think the algorithm is popular enough to have a sort of intuitive understandability for most people.
This is the code in question:
https://github.com/iv-org/invidious/blob/81ca8314396524e9a51901a70dfb86b99d6c7cf6/src/invidious/comments/content.cr#L12-L27
Yeah it is not really a duplicate, we don't need full markdown, the suggestion here ist just to include less characters in links sometimes.
For reference here's what the commonmark specs say:
https://spec.commonmark.org/0.31.2/#autolink
A URI autolink consists of <, followed by an absolute URI followed by >. It is parsed as a link to the URI, with the URI as the link’s label.
An absolute URI, for these purposes, consists of a scheme followed by a colon (:) followed by zero or more characters other than ASCII control characters, space, <, and >. If the URI includes these characters, they must be percent-encoded (e.g. %20 for a space).
For purposes of this spec, a scheme is any sequence of 2–32 characters beginning with an ASCII letter and followed by any combination of ASCII letters, digits, or the symbols plus (“+”), period (“.”), or hyphen (“-”).
All we need to do to implement this should be to just make the regex less permissive.
I don't think we need to follow the markdown specifications really but just ensure that we're properly excluding non-url characters.
I was not clear, I am not referring to the links in angle brackets but rather automatically detected links, such as described here: https://github.com/mattcone/markdown-guide/blob/master/_extended-syntax/automatic-url-linking.md. I would expect the rules to be a bit different. For example, I wrote a period after this link that I just pasted her and it was smartly not included, but that does not mean that period characters are not valid in the URL, as shown by the period in .md.